The intent of the course project is to give you some practice at
doing machine learning research. I am open to your own projects
and ideas, as long as you use machine learning in a meaningful way.
If you would like to discuss it in advance, I suggest that you come to
my office hour.
- Choosing the right problem. Ideally you will have a problem
from your current/potential research area which could benefit from the
use of machine learning techniques. Please feel free to use this
problem for your project. However, you must not submit work you have
done before this course as your project.
If you haven't decided on a research area, or would like to work on
something different, that is fine too. A great resource for datasets
to work on is the UCI
repository.
Don't choose something that is too hard nor too simple. If in
doubt, please come to my office hours and ask about your topic. A
rough guideline for grad projects is that they should be approximately
2 times as much work as one assignment.
- What has been done before? A month in the lab can save you
a day in the library. This is a course project, and not a
peer-reviewed paper, but you should be aware of the most closely
related work. In fact, a perfectly good project is to implement a
previous paper (of non-trivial complexity). I expect roughly 3-5
citations to other work as part of your project report.
You must also maintain high standards of academic integrity.
Standing on the shoulders of giants is highly recommended, just make
it clear who these giants are. If you use someone else's code, you
must provide a citation. If you use text/equations from someone
else's paper, you must cite and quote it. If you use figures from
another paper, you must clearly state such.
- Comparative experiments. You must compare what you have
done to at least one other method to know if anything interesting has
been achieved. Proper experiments should only change one component at
a time (e.g. different classifier, same features). You should also
study different parameters of algorithms to ascertain sensitivity
(e.g. regularization parameter values). If you are using a standard
dataset, you can compare your results (one method) to others'. Just
make sure the experiments are comparable (e.g. same training/test data).
You will not be graded on the quality of your results, but
on the quality of your experimental methodology.
- Quality of exposition. If you write a paper and nobody can
read it, does it make a contribution? Clearly state the problem
you worked on, the methods you used, who has done what before,
what was the intent of your project, which datasets, and what parameters
you used. Use a spell-checker, and provide figures visualizing your results with legible fonts and
labelled axes.
A standard project report has four sections:
- Introduction (includes citations to closely related work)
- Approach
- Experiments
- Conclusion