This page isn't meant to be a definitive guide to working with the Hadoop tools, but will hopefully give you enough to get up and running for the course.
Of course, you are welcome to use your own computer for your work in this course. You can also connect remotely to the workstations in the physical labs.
- Platform: Setting up your environment, depending on your OS.
- LabSpace: The lab/work space we have for the course.
- PythonSpark: working with Python and Spark code
- SparkSkeleton: reasonable skeleton for a Spark app
- CompilingHadoop: compiling Hadoop code on your machine (command-line version).
- Cluster: working with the cluster we have available for this course.
- CompilingHadoop: also gives basic commands to get MapReduce jobs running.
- HadoopExternalJARs: dealing with JARs that aren't in the default set when running jobs.
- RunningSpark: running Spark jobs
- Cassandra + Spark + Python
- Kafka + Spark
- Project technology choices
- Project cluster usage
- External Libraries & Spark
- GitLab for source control
See also the Links pages for more reference material.