Course Technology
This page isn't meant to be a definitive guide to working with the Hadoop tools, but will hopefully give you enough to get up and running for the course.
Working Remotely
Of course, you are welcome to use your own computer for your work in this course. You can also connect remotely to the workstations in the physical labs.
Writing Code
- Platform: Setting up your environment, depending on your OS.
- LabSpace: The lab/work space we have for the course.
- PythonSpark: working with Python and Spark code
- SparkSkeleton: reasonable skeleton for a Spark app
Compiling Code
- CompilingHadoop: compiling Hadoop code on your machine (command-line version).
Running Code
- Cluster: working with the cluster we have available for this course.
- CompilingHadoop: also gives basic commands to get MapReduce jobs running.
- HadoopExternalJARs: dealing with JARs that aren't in the default set when running jobs.
- RunningSpark: running Spark jobs
- Cassandra + Spark + Python
- Kafka + Spark
- Spark + S3
Final Project
See also the Links pages for more reference material.
Updated Thu Aug. 22 2024, 11:06 by ggbaker.