Not logged in. Login



Within this course, you are free to use whichever computing platform you preferred. The on-campus labs are a mix of Linux and Windows workstations. Instructions for the course will be predominantly Linux.


There are three predominant platforms on the market:

  1. "Linux" - This is an umbrella that includes many distro. Any modern distro is useable. The Big Data ecosystem is essentially Linux-based so this is an ideal platform to use.

  2. Windows - If you are using Windows, it needs to be Windows 10 with WSL2. WSL2 adds a tightly integrated Linux machine that gives you the flexibility of using the Linux toolset within a mainstream UI. If you are new to WSL2 and/or Linux, start with Ubuntu 20.04.

  3. MacOS - If you are using MacOS, any recent release will do.

In all cases, the minimum hardware is 16GB and 500GB of (SSD) hard drive. You can get by with less of either but you will need to pay attention to the workload.

Command shell

If you are not familiar already with a command shell, it's time to get comfortable. If you starting out, either bash or zsh are good choices and readily available for the platforms above. Google and setup your environment. Note that in Windows, stay within the Linux VM and do not stray into Windows' console.

Editor (optional)

If you are not already using a GUI-based editor, install Atom or Sublime Text. You can also go with an IDE such as Eclipse, IntelliJ or Visual Studio Code.

Package Manager

You will want to use a package manager to manage the addition and removal of various software packages.

  1. In Linux, your package manager is typically baked in to your chosen distro. Read and get familiar with it.
  2. In MacOS, start with Homebrew.
  3. In Windows, note that you are using the Linux VM so refer to comment above. You may encounter Chocolatey but it is not useful when you are installing software into your Linux VM.

Additional managers to consider/add later on include:

  1. anaconda
  2. pyenv (Introduction to pyenv)
  3. SDKman

git front-end (optional)

Check your environment and install git per the git community's instructions. In many environments (e.g., MacOS and many Linux distro), git comes pre-installed; check.

Install Github Desktop too to add a convenient GUI front-end to git. Optional but highly recommended.

See here for a git cheatsheet. (Additional tutorial at DZone and a supervisual cheatsheet by Matt Harrison.)

Docker (optional)

Container is a very convenient technology and Docker is a popular package for working with one popular format. Install docker (use your package manager) & Docker Desktop from


Install the AWS CLI per Amazon's instruction.

Microsoft Azure (optional)

If you wish to experiment with Azure, install the Azure CLI per Microsoft's instruction. Azure's student credit have little to no restrictions and may be applied for via your SFU account.

1.7 Google Cloud Platform (optional)

Similarly, you can install the gcloud CLI per Google's instruction. Credits here are again useable and readily available.

2.1 Github (optional)

Github is a social platform for sharing and working with many open source projects.


  1. If you do not have a GitHub account, create an account here.

  2. Follow the instructions here for the command-line client. See here to configure your git for with 2FA.

  3. To use Github's Container Registry feature to host Docker images, follow these instructions.

  4. Create a personal access token (PAT) for your Github account. You will need the three scopes: read:packages, write:packages and delete:packages.

  5. Finally, configure your docker client to use GitHub Container Registry ( using this new token as follows:

    $ export CR_PAT=<your-token>
    $ echo $CR_PAT | docker login -u <your-github-id> --password-stdin
    > Login Succeeded
Updated Mon Aug. 29 2022, 10:52 by kaiyeec.