CMPT 726 G1

CMPT 310 D1

Introduction to Machine Learning

CMPT 726

Simon Fraser University

Fall 2017

Inactive Links are Under Construction

Course Logistics

Instructor: Oliver Schulte
Office Location: TASC 1 9021.
Office Phone: 778-782-3390.
Office Hours: Thursday 2:30-3:30 pm
E-mail Office Hour: Friday 10-11 am.

Teaching Assistant1: Nelson Nauata
Office Hours: Wednesday 3:00-4:00pm
Office Location: ASB9808

Teaching Assistant2: Ruturaj Patel
Office Hours: TBA
Office Location: TBA

Announcements

The textbook math appendix is available here.
Double Office Hour this Wednesday for Midterm Review

Resources

Course Schedule. Updated Nov 8.
Schedule for Guest Lecture by Dr. Anton Smessaert
Course Syllabus
Textbook website
The 24x7 on-line book collection, for reference material.
Installing Canvas Mobile. You need to install canvas to do the in-class surveys. You can also do the on-line quizzes this way.
SFU Medical Excuse Form.
If you want to use GPUs, students can access some in CSIL and in the Big Data lab. Greg Baker has provided GPU instructions
Powerpoint Slides on the Weka Interface The Weka GUI is pretty much self-explanatory, but if you have questions about it, this presentation has a lot of detail.

Lecture Notes

Excel spreadsheet with Example Computations
Probability Concepts, Probabilistic Reasoning Updated Sep 12.
Bayesian Networks
Learning Bayesian Network Parameters: Discrete Variables Only
- Learning Bayesian Network Parameters The Bayesian Way Optional material
Learning Decision Trees
The Naive Bayes Classifier
The Gaussian Distribution Learning Bayesian Network Parameters for a continuous variable with no parents
Linear Regression Learning Bayesian network parameters for a continuous variable with many parents. Updated oct 17, 2007.
Linear Classification Learning Bayesian network parameters for a discrete variable with continuous parents.
Neural Nets Updated Nov 7. And here's my short intro to neural net architectures
Nearest Neighbour Methods
Support Vector Machines
Ensemble Learning
EM: Parameter Learning With Unobserved Variables
Component Analysis
Convolutional Neural Networks
Sequential Data

Assignments

Assignment 1 text
- Jar File for UBC Aispace tool
- Hockey Draft Dataset. For dataset info see the readme. Columns to drop:
  - id, Playername. You always need to drop id columns to do machine learning.
  - Sum_7yr_GP, sum_7yr_TOI, Overall. These columns specify information about a player's future success that is not actually available at draft time.
  - DraftYear. We split the data into training and test set. So you cannot generalize from training to test data and should not use this feature in learning.
  - po_PlusMinus. Turns out that the hockey data provider we relied on has unreliable data in this column for the years concerned.
  - country. Because there are too many countries, we grouped the European countries together, as shown in the readme and in the country_group column.
Assignment 2 text
- Using Weka for decision tree learning.
- Hockey Draft Dataset for this assignment with the following training and test set:
- Training Set: Use the draft years 2004, 2005, 2006 as training data.
- Test Set: Use the draft year 2007 as test data.
- Data Preprocessing: Drop the following columns: id, Playername, sum_7yr_TOI, DraftYear, country, Overall. In addition different parts of the assignment may require different preprocessing steps.
- Update: if you encounter a column with standard deviation 0, you cannot standardize by dividing by the standard deviation. Best to drop it.
- Sample Code
  - Python Code Hints
  - Matlab Code Hints
Assignment 3 text. Nov 7: I've changed some notation to be consistent with the text notation.
- Hockey Draft Dataset for this assignment with the following training and test set:
- Training Set: Use the draft years 2004, 2005, 2006 as training data.
- Test Set: Use the draft year 2007 as test data.
- Data Preprocessing: Drop the following columns: id, Playername, sum_7yr_TOI, DraftYear, country, Overall. In addition the assignment specifies other preprocessing steps for different parts.
- Sample Logistic Regression Code
  - Python Code Hints
  - Matlab Code Hints
- Deep learning.
  - Important Warning: Deep learning can take a long time. Even with a relatively small dataset like ours a single weight optimization could take hours. I suggest that you try to perform one training session as soon as possible to get a sense of how long it will take on your system. Leaving it for the day before the assignment is a recipe for disaster.
  - A variety of deep learning systems are described on the deep learning site, as well as in the Stanford course notesincluding the following. Most students have found Keras and Caffee the most user-friendly.
- Keras
- Caffe
- Theano
- cuda-covnet
- Torch
- Matlab
- Tensor Flow

Final Project Information

Project Format and Topics
Latex Poster Template
Powerpoint Poster Template
If you want to use GPUs, students can access some in CSIL and in the Big Data lab. Greg Baker has provided GPU instructions

Exam Resources

Broken links are under construction

Our Exam.
1. Exam Information. Break down of topics. You can use a cheat sheet, 1 (one) 8.5 x 11 sheet of notes for reference, both sides. Updated Oct 26, 2017.
2. Exam 1 Instructions. under construction
Other Exams

Simon Fraser University
Engaging the World

CourSys

Actions