CMPT 726: Machine Learning

CMPT 726: Machine Learning Project

The intent of the course project is to give you some practice at doing machine learning research. I am open to your own projects and ideas, as long as you use machine learning in a meaningful way. If you would like to discuss it in advance, I suggest that you come to my office hour.

Logistics

You must work with a group.

Minimum Group Size = 2. Maximum Group Size = 5.
Deadline for forming groups: Nov 16, 2017 . Students who have not formed groups by then will be assigned by us. Please create a group on courses.cs.sfu.ca so we can record grades for your group.
You should seek feedback on your project asap. You can come to my office hour to discuss it. Submit a brief outline of your project, length of 1 paragraph. The due date for preparing an outline is Nov 23. The outline is worth 1%.
There will be a poster session. Participation by your group is optional. If you do present a paper, your project grade will be weighted as 30% poster, 70% report. Otherwise 100% report.

Methodology

The key components on which you will be graded, are:

Choosing the right problem. Ideally you will have a problem from your current/potential research area which could benefit from the use of machine learning techniques. Please feel free to use this problem for your project. However, you must not submit work you have done before this course as your project.
If you haven't decided on a research area, or would like to work on something different, that is fine too. A great resource for datasets to work on is the UCI repository.
Don't choose something that is too hard nor too simple. If in doubt, please come to my office hours and ask about your topic. A rough guideline for grad projects is that they should be approximately 2 times as much work as one assignment.
What has been done before? A month in the lab can save you a day in the library. This is a course project, and not a peer-reviewed paper, but you should be aware of the most closely related work. In fact, a perfectly good project is to implement a previous paper (of non-trivial complexity). I expect roughly 3-5 citations to other work as part of your project report.
You must also maintain high standards of academic integrity. Standing on the shoulders of giants is highly recommended, just make it clear who these giants are. If you use someone else's code, you must provide a citation. If you use text/equations from someone else's paper, you must cite and quote it. If you use figures from another paper, you must clearly state such.
Comparative experiments. You must compare what you have done to at least one other method to know if anything interesting has been achieved. Proper experiments should only change one component at a time (e.g. different classifier, same features). You should also study different parameters of algorithms to ascertain sensitivity (e.g. regularization parameter values). If you are using a standard dataset, you can compare your results (one method) to others'. Just make sure the experiments are comparable (e.g. same training/test data).
You will not be graded on the quality of your results, but on the quality of your experimental methodology.
Quality of exposition. If you write a paper and nobody can read it, does it make a contribution? Clearly state the problem you worked on, the methods you used, who has done what before, what was the intent of your project, which datasets, and what parameters you used. Use a spell-checker, and provide figures visualizing your results with legible fonts and labelled axes.
A standard project report has four sections:
1. Introduction (includes citations to closely related work)
2. Approach
3. Experiments
4. Conclusion

Types of Topics

Applications to specific problems. I expect this to be the most common format. You could apply an existing machine learning algorithm to a problem of interest to you. There would be value also in implementing modifications of existing algorithms if necessary for your application.
A survey or synthesis of a few related papers on a topic of interest to you. For example, you could summarize Bayesian approaches to curve fitting or explore new topics like Gaussian processes.
A theoretical research project. This might look at mathematical questions, e.g. proving performance guarantees for machine learning algorithms, or deriving methods from assumptions.
Implementing Algorithms.
Other listings of course projects from other universities. These might give you some ideas. Doing a web search of your own is fine.

CMU 1998
CMU 2007.This one contains datasets as well.
The Kaggle competition has real-world challenge problems and data sets. You can compare your system with others. If you enter this year's competition, you could win big bucks! Just managing to post a reasonable entry would be enough for a course project, you don't have to win. Kaggle competitors often post what they've tried, sometimes called Kaggle kernel. You can use these methods as baselines for your project.
The KDD Cup provides clean real-world datasets. These have been analyzed many times so it should be easy for you to find comparison points.
I'm also working on machine learning for relational databases and network data. Let me know if you are interested in projects related to that.

Grading Criteria for Final Project.

30% Presentation. Clarity, conciseness, spelling---quality of exposition.
40% Originality. To what extent were you creative in developing your own ideas?
30% Evaluation, methodology.

Handing in the Project.

Prepare a project report using the NIPS style files.
- The page limit is 7 pages in the NIPS format. This does not count references but it counts everything else (figures, plots, code fragments,...). Your report must include a sub-section called "Contributions" which states what each group member did in the project.
The report is due at 11:59pm on Tuesday December 5. Because of my grade submission deadline, the project submission deadline will be horribly strict - no grace days.
There will be a poster session at 10:30am on Nov 30. Participation by a group is optional.
You must submit the report electronically on the submission server in PDF format.