Not logged in. Login

Project

This project is to be completed individually.

Topic Options

There is a page of project ideas that may inspire you, or you can work on a topic of your own choice. Or, find data that seems interesting to you and start asking yourself what can be learned from it.

If you would like to discuss your project topic, the best choice is the weekly CSIL workshop times. You could also post a private message on the discussion forum, but it can be harder to discuss in that medium.

Topic Cautions

  • Try to keep data at the centre: other techniques (NLP, image processing, etc) are welcome but shouldn't be the main focus of the project.
  • If your topic is the result of asking Google/ChatGPT "good data science project idea" or many already-completed versions of your project can be found by searching for similar keywords in Google or on GitHub, it's not a good topic. Find something more original.
  • Overly-clean single-purpose data (like from Kaggle) is also boring. Either find some data where some work is needed to figure it out, or combine multiple data sources.
  • If you need to collect data, ensure as soon as possible that you can actually get it (i.e. it is in files on your computer, not that you have a vague plan to get it). If you don't have the data fairly early in the project, consider changing topics: the danger of having an idea but no data is high.

Requirements

The project is worth 22% of the final mark: this should be some guide to the scope. That's as much as six weekly exercises.

  • At least one of:
    • a question that requires significant data collection or cleaning or non-obvious interpretations of existing data;
    • data sets from different sources that can be combined in a non-obvious way (i.e. probably not simply JOINed);
    • several questions about a single data set that require distinct techniques.
  • Optional, but would soften the previous requirement:
    • some significant use of big data tools.
  • Optional, but recommended:
    • Imagine a client who has contracted you. What questions would they have, or what problems might they be trying to solve with this data? Use this imagined scenario to guide your project.
  • Apply the concepts and techniques covered in the course (as relevant to the topic) in a way that would produce useful results.
  • Not requirements: a specific p value or accuracy score.

For the exercises, it should be clear that you're being guided very carefully: I have worked through the problems and have a very good idea how you will solve them.

That is not true for the project. The direction you take is up to you: we may be able to help you, or we may have no idea what problem you're encountering.

Code

You will be submitting a tag to a Git repository on SFU's Github server containing your code for the project.

In your repository's README.md file, you should document your code and how to run it: required libraries, commands (and arguments), etc. You should do this because you should always do that, but not because we are realistically going to spend the time to run your code.

Please also commit either the input data (if it's small and you can distribute it) or one or two small sample input files in the format your code expects (if relevant). Think of these files documenting the input, not necessarily as ready-to-run data.

Make sure you add the instructor and TAs (ggbaker, sma318, fma44, wta55) as collaborators on the project as well so we can see your code to mark it.

Report

You should write a report of 3–5 pages (reasonably-standard formatting, single-spaced, including figures).

When writing the report, perhaps imagine you have been asked to do this analysis as part of a job, and your audience is your coworkers and manager: you should address technical aspects of the project and how you got your results in a way that's accessible to a technically-literate person. On the other hand, it shouldn't be too jargon-heavy.

In that report, you should address (as relevant):

  • The problem you are addressing, particularly how you refined the provided idea.
  • The data that you used: how it was gathered, cleaned, etc.
  • Techniques you used to analyse the data.
  • Your results/findings/conclusions.
  • Some appropriate visualization of your data/results. Have a look at The Python Graph Gallery for less-common but possibly more interesting visualizations.
  • Limitations: problems you encountered, things you would do if you had more time, things you should have done in retrospect, etc.

On report length, since people often ask: in general, if you're asked for content of a certain length (presentation, report, etc) and you produce something notably different, that doesn't meet the spec and is worse. Deciding what is important to say is a critical part of communicating. The report is going to be the most critical way we're going to learn about your project, and you need to get to the point, not ramble on and give every possible thought you've had. The TAs aren't going to be too strict with the length, but if you annoy them, that's a bad report.

Project Experience Summary

In your report, include (in addition to the above page restriction) an overview of the project as your would include it as project experience on your resumé. Co-op calls this a “accomplishment statement” and it might go on a resumé under a heading like “Project Experience”.

When writing content for your resumé, focus on your skills, education, experience, and knowledge as accomplishments, rather than duties and tasks. Accomplishment statements could also be phrased by beginning with the result, rather than ending with it.

SFU Co-op has some guidelines on how to write accomplishment statement.

Grading

Criteria for marking the projects will include:

  • acquiring/cleaning the data;
  • defining/refining the problem;
  • data analysis;
  • how well you explained the whole thing.

Submitting

Submit your files through CourSys for the Project activity.

Updated Wed June 25 2025, 13:33 by ggbaker.