This page isn't meant to be a definitive guide to working with the Hadoop tools, but will hopefully give you enough to get up and running for the course.
- LabSpace: The lab/work space we have for the course.
- PythonSpark: working with Python and Spark code
- SparkSkeleton: reasonable skeleton for a Spark app
- CompilingHadoop: compiling Hadoop code on your machine (command-line version).
- Cluster: working with the cluster we have available for this course.
- CompilingHadoop: also gives basic commands to get MapReduce jobs running.
- HadoopExternalJARs: dealing with JARs that aren't in the default set when running jobs.
- RunningSpark: running Spark jobs
- Cassandra + Spark + Python
- Kafka + Spark
- Project technology choices
- Project cluster usage
- External Libraries & Spark
- GitLab for source control
See also the Links pages for more reference material.
Updated Tue Aug. 27 2019, 21:35 by ggbaker.