Not logged in. Login

CMPT 726 G1

sfulogo.gif/viewCMPT 310 D1

Introduction to Machine Learning

CMPT 726

Simon Fraser University

Fall 2017


Inactive Links are Under Construction

Course Logistics

Instructor: Oliver Schulte
Office Location: TASC 1 9021.
Office Phone: 778-782-3390.
Office Hours: Thursday 2:30-3:30 pm
E-mail Office Hour: Friday 10-11 am.

Teaching Assistant1: Nelson Nauata
Office Hours: Wednesday 3:00-4:00pm
Office Location: ASB9808

Teaching Assistant2: Ruturaj Patel
Office Hours: TBA
Office Location: TBA


Announcements

  • The textbook math appendix is available here.
  • Double Office Hour this Wednesday for Midterm Review

Resources

Lecture Notes

Assignments

  • Assignment 1 text
    • Jar File for UBC Aispace tool
    • Hockey Draft Dataset. For dataset info see the readme. Columns to drop:
      • id, Playername. You always need to drop id columns to do machine learning.
      • Sum_7yr_GP, sum_7yr_TOI, Overall. These columns specify information about a player's future success that is not actually available at draft time.
      • DraftYear. We split the data into training and test set. So you cannot generalize from training to test data and should not use this feature in learning.
      • po_PlusMinus. Turns out that the hockey data provider we relied on has unreliable data in this column for the years concerned.
      • country. Because there are too many countries, we grouped the European countries together, as shown in the readme and in the country_group column.
  • Assignment 2 text
    • Using Weka for decision tree learning.
    • Hockey Draft Dataset for this assignment with the following training and test set:
    • Training Set: Use the draft years 2004, 2005, 2006 as training data.
    • Test Set: Use the draft year 2007 as test data.
    • Data Preprocessing: Drop the following columns: id, Playername, sum_7yr_TOI, DraftYear, country, Overall. In addition different parts of the assignment may require different preprocessing steps.
    • Update: if you encounter a column with standard deviation 0, you cannot standardize by dividing by the standard deviation. Best to drop it.
    • Sample Code
  • Assignment 3 text. Nov 7: I've changed some notation to be consistent with the text notation.
    • Hockey Draft Dataset for this assignment with the following training and test set:
    • Training Set: Use the draft years 2004, 2005, 2006 as training data.
    • Test Set: Use the draft year 2007 as test data.
    • Data Preprocessing: Drop the following columns: id, Playername, sum_7yr_TOI, DraftYear, country, Overall. In addition the assignment specifies other preprocessing steps for different parts.
    • Sample Logistic Regression Code
    • Deep learning.
      • Important Warning: Deep learning can take a long time. Even with a relatively small dataset like ours a single weight optimization could take hours. I suggest that you try to perform one training session as soon as possible to get a sense of how long it will take on your system. Leaving it for the day before the assignment is a recipe for disaster.
      • A variety of deep learning systems are described on the deep learning site, as well as in the Stanford course notesincluding the following. Most students have found Keras and Caffee the most user-friendly.
    • Keras
    • Caffe
    • Theano
    • cuda-covnet
    • Torch
    • Matlab
    • Tensor Flow

Final Project Information

Exam Resources

Broken links are under construction

  1. Our Exam.
    1. Exam Information. Break down of topics. You can use a cheat sheet, 1 (one) 8.5 x 11 sheet of notes for reference, both sides. Updated Oct 26, 2017.
    2. Exam 1 Instructions. under construction
  2. Other Exams

Learning

Journal and conferences

Updated Tue Dec. 12 2017, 14:12 by oschulte.