Course Technology

This page isn't meant to be a definitive guide to working with the Hadoop tools, but will hopefully give you enough to get up and running for the course.

Working Remotely

Of course, you are welcome to use your own computer for your work in this course. You can also connect remotely to the workstations in the physical labs.

Writing Code

Platform: Setting up your environment, depending on your OS.
LabSpace: The lab/work space we have for the course.
PythonSpark: working with Python and Spark code
SparkSkeleton: reasonable skeleton for a Spark app

Compiling Code

CompilingHadoop: compiling Hadoop code on your machine (command-line version).

Running Code

Cluster: working with the cluster we have available for this course.
CompilingHadoop: also gives basic commands to get MapReduce jobs running.
HadoopExternalJARs: dealing with JARs that aren't in the default set when running jobs.
RunningSpark: running Spark jobs
Cassandra + Spark + Python
Kafka + Spark
Spark + S3

Final Project

See also the Links pages for more reference material.

Updated Tue Aug. 26 2025, 14:15 by ggbaker.

Simon Fraser University
Engaging the World

CourSys