Not logged in. Login

Compute Cluster

We have a relatively modest Hadoop cluster for this course: 4 nodes, 60 cores, 128GB memory, 16TB storage.

Connecting Remotely

The goal here is to connect to cluster.cs.sfu.ca by SSH.

From On-Campus

[This will work if you are within the campus network (or have the SFU VPN).]

You will be connecting to the cluster a lot: you may want to get things set up more nicely to make your life easier later. But, this should at least work.

You generally just need to SSH to cluster.cs.sfu.ca (substituting whatever SSH method you use on your computer):

[yourcomputer]$ ssh -p24 <USERID>@cluster.cs.sfu.ca 
[gateway]$

Once you're connected to the cluster gateway, you can start running spark-submit (and hdfs) commands.

If you need access to the web frontends in the cluster, you can do the initial SSH with a longer command including some port forwards:

ssh -p24 -L 9870:controller.local:9870 -L 8088:controller.local:8088 <USERID>@cluster.cs.sfu.ca

From On-Campus With SSH Keys

Once you have confirmed that you can connect, get things set up properly

Create an SSH key (if you don't have one already) so you can log in without a password. Then copy your public key into .ssh/authorized_keys on the server (with ssh-copy-id or by appending to ~/.ssh/authorized_keys).

Create (or add to) the ~/.ssh/config file on your computer. With this config, you can simply ssh cluster.cs.sfu.ca to connect. (bonus: tab-completion)

Host cluster.cs.sfu.ca
  User <USERID>
  Port 24
  LocalForward 8088 controller.local:8088
  LocalForward 9870 controller.local:9870

# Use this via `ssh clustergw` if you're connecting from off-campus without VPN
Host clustergw
  HostName cluster.cs.sfu.ca
  User <USERID>
  Port 24
  LocalForward 8088 controller.local:8088
  LocalForward 9870 controller.local:9870
  ProxyCommand ssh -p 24 gateway.csil.sfu.ca exec nc %h %p

With this configuration, port forwards will let you connect (in a limited unauthenticated way) to the web interfaces:

Copying Files

You will also frequently need to copy files to the cluster:

[yourcomputer]$ scp code.py <USERID>@cluster.cs.sfu.ca:

Or whatever your preferred SCP/SFTP method is.

From Off-Campus

From off-campus networks, you need an extra hop to get to the cluster. The most reliable is probably to ssh to gateway.csil.sfu.ca (port 24). We need to do a two-step port forward. On a Linux-like system, you can add this to your ~/.ssh/config to forward from your computer to the gateway:

Host gateway.csil.sfu.ca
  User <USERID>
  Port 24
  LocalForward 8088 localhost:8088
  LocalForward 9870 localhost:9870
  ServerAliveInterval 120

Then on gateway.csil, you can set up an SSH key and config so connecting is easy in the future:

ssh-keygen -t ed25519
echo -e "Host cluster.cs.sfu.ca\n Port 24\n ServerAliveInterval 120\n LocalForward 8088 controller.local:8088\n LocalForward 9870 controller.local:9870" >> ~/.ssh/config
ssh-copy-id cluster.cs.sfu.ca
ssh cluster.cs.sfu.ca

Then to connect:

ssh cluster.cs.sfu.ca

Copying Files

Copying files will also be a two-step process: to gateway.csil, and then from there:

scp mycode.py cluster.cs.sfu.ca:

Spark Applications

Once you have the code there, you can start jobs as usual with spark-submit, and they will be sent to the cluster:

spark-submit code.py ...

Cleaning Up

If you have unnecessary files sitting around (especially large files created as part of an assignment), please clean them up with a command like this:

hdfs dfs -rm -r output*

It is possible that you have jobs running and consuming resources without knowing: maybe you created an infinite loop or otherwise have a job burning memory or CPU. You can list jobs running on the cluster like this:

yarn application -list

And kill a specific job:

yarn application -kill <APPLICATION_ID>
Updated Sun Jan. 09 2022, 17:22 by sbergner.