CMPT 726 G1
![]() | CMPT 310 D1 |
Introduction to Machine Learning
CMPT 726
Simon Fraser University
Fall 2017
Inactive Links are Under Construction
Course Logistics
Instructor: Oliver Schulte
Office Location: TASC 1 9021.
Office Phone: 778-782-3390.
Office Hours: Thursday 2:30-3:30 pm
E-mail Office Hour: Friday 10-11 am.
Teaching Assistant1: Nelson Nauata
Office Hours: Wednesday 3:00-4:00pm
Office Location: ASB9808
Teaching Assistant2: Ruturaj Patel
Office Hours: TBA
Office Location: TBA
Announcements
- The textbook math appendix is available here.
- Double Office Hour this Wednesday for Midterm Review
Resources
- Course Schedule. Updated Nov 8.
- Schedule for Guest Lecture by Dr. Anton Smessaert
- Course Syllabus
- Textbook website
- The 24x7 on-line book collection, for reference material.
- Installing Canvas Mobile. You need to install canvas to do the in-class surveys. You can also do the on-line quizzes this way.
- SFU Medical Excuse Form.
- If you want to use GPUs, students can access some in CSIL and in the Big Data lab. Greg Baker has provided GPU instructions
- Powerpoint Slides on the Weka Interface The Weka GUI is pretty much self-explanatory, but if you have questions about it, this presentation has a lot of detail.
Lecture Notes
- Excel spreadsheet with Example Computations
- Probability Concepts, Probabilistic Reasoning Updated Sep 12.
- Bayesian Networks
- Learning Bayesian Network Parameters: Discrete Variables Only
- Learning Bayesian Network Parameters The Bayesian Way Optional material
- Learning Decision Trees
- The Naive Bayes Classifier
- The Gaussian Distribution Learning Bayesian Network Parameters for a continuous variable with no parents
- Linear Regression Learning Bayesian network parameters for a continuous variable with many parents. Updated oct 17, 2007.
- Linear Classification Learning Bayesian network parameters for a discrete variable with continuous parents.
- Neural Nets Updated Nov 7. And here's my short intro to neural net architectures
- Nearest Neighbour Methods
- Support Vector Machines
- Ensemble Learning
- EM: Parameter Learning With Unobserved Variables
- Component Analysis
- Convolutional Neural Networks
- Sequential Data
Assignments
- Assignment 1 text
- Jar File for UBC Aispace tool
- Hockey Draft Dataset. For dataset info see the readme. Columns to drop:
- id, Playername. You always need to drop id columns to do machine learning.
- Sum_7yr_GP, sum_7yr_TOI, Overall. These columns specify information about a player's future success that is not actually available at draft time.
- DraftYear. We split the data into training and test set. So you cannot generalize from training to test data and should not use this feature in learning.
- po_PlusMinus. Turns out that the hockey data provider we relied on has unreliable data in this column for the years concerned.
- country. Because there are too many countries, we grouped the European countries together, as shown in the readme and in the country_group column.
- Assignment 2 text
- Using Weka for decision tree learning.
- Hockey Draft Dataset for this assignment with the following training and test set:
- Training Set: Use the draft years 2004, 2005, 2006 as training data.
- Test Set: Use the draft year 2007 as test data.
- Data Preprocessing: Drop the following columns: id, Playername, sum_7yr_TOI, DraftYear, country, Overall. In addition different parts of the assignment may require different preprocessing steps.
- Update: if you encounter a column with standard deviation 0, you cannot standardize by dividing by the standard deviation. Best to drop it.
- Sample Code
- Assignment 3 text. Nov 7: I've changed some notation to be consistent with the text notation.
- Hockey Draft Dataset for this assignment with the following training and test set:
- Training Set: Use the draft years 2004, 2005, 2006 as training data.
- Test Set: Use the draft year 2007 as test data.
- Data Preprocessing: Drop the following columns: id, Playername, sum_7yr_TOI, DraftYear, country, Overall. In addition the assignment specifies other preprocessing steps for different parts.
- Sample Logistic Regression Code
- Deep learning.
- Important Warning: Deep learning can take a long time. Even with a relatively small dataset like ours a single weight optimization could take hours. I suggest that you try to perform one training session as soon as possible to get a sense of how long it will take on your system. Leaving it for the day before the assignment is a recipe for disaster.
- A variety of deep learning systems are described on the deep learning site, as well as in the Stanford course notesincluding the following. Most students have found Keras and Caffee the most user-friendly.
- Keras
- Caffe
- Theano
- cuda-covnet
- Torch
- Matlab
- Tensor Flow
Final Project Information
- Project Format and Topics
- Latex Poster Template
- Powerpoint Poster Template
- If you want to use GPUs, students can access some in CSIL and in the Big Data lab. Greg Baker has provided GPU instructions
Exam Resources
Broken links are under construction
- Our Exam.
- Exam Information. Break down of topics. You can use a cheat sheet, 1 (one) 8.5 x 11 sheet of notes for reference, both sides. Updated Oct 26, 2017.
- Exam 1 Instructions. under construction
- Other Exams
Links
Learning
- VideoLectures has a large collection of ML seminars and tutorials. I recommend in particular the following two.
- Andrew Ng's machine learning course. This is an introductory course aimed at undergraduates. You may have to enrol. There is an Octave Tutorial, which is similar to Matlab.
- A higher-level introduction by Daphne Koller to statistical graphical models. You may have to enrol.
- Andrew Moore's tutorials. Introductory slides on many of the topics we discuss. Stephen Boyd's Convex Optimization course at Stanford (videos available)
- Useful Things to Know About Machine Learning. By Pedro Domingos.
Journal and conferences
- Journal of Machine Learning Research (JMLR)
- Neural Information Processing Systems (NIPS)
- International Conference on Machine Learning (ICML) 2017
- Uncertainty in Artificial Intelligence (UAI)
- Artificial Intelligence and Statistics (AISTATS) 2017
- IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Updated Tue Dec. 12 2017, 14:12 by oschulte.