Introduction to High Performance Computing with the Raspberry Pi: SFTP

Key Points

Introduction to High Performance Computing
  • A cluster is a group of computers connected together to act as one.

  • Clusters are formed of nodes, each usually has several processors and 10s or hundreds of gigabytes of RAM.

  • The Raspberry Pi is a simple single board computer with between one and four processor cores and half and four gigabytes of RAM.

Logging in to the cluster
  • ssh lets us login to a remote computer system

  • sinfo shows partitions and how busy they are.

Filesystems and Storage
  • The SCP and SFTP protocols can copy files to a cluster.

  • Most real clusters have large disks.

  • Most real clusters have a backed up space limited home and a larger scratch drive for temporary data.

Running Jobs with Slurm
  • Interactive jobs let you test out the behaviour of a command, but aren’t pratical for running lots of jobs

  • Batch jobs are suited for submitting a job to run without user interaction.

  • Job arrays are useful to submit lots of jobs.

  • Slurm lets you set parameters about how many processors or nodes are allocated, how much memory or how long the job can run.

Profiling Single Core Performance
  • Each programming language typically provides tools called profilers with which you can analyse the runtime of your code.

  • The estimate of pi spends most of it’s time while generating random numbers.

  • The estimation of pi with the Monte Carlo method is a compute bound problem because pseudo-random numbers are just algorithms.

Parallel Estimation of Pi
  • Amdahl’s law is a description of what you can expect of your parallelisation efforts.

  • Use the profiling data to calculate the time consumption of hot spots in the code.

  • The generation and processing of random numbers can be parallelized as it is a data parallel task.

  • Time consumption of a single application can be measured using the time utility.

  • The ratio of the run time of a parallel program divided by the time of the equivalent serial implementation, is called speed-up.

Distributing computations among computers with MPI
  • The MPI driver mpirun sends compute jobs to a set of allocated computers.

  • The MPI software then executes these jobs on the remote hosts and synchronizes their state/memory.

  • MPI assigns a rank to each process, usually the one with a rank of zero does the coordination

  • MPI can be used to split a task into components and have several nodes run them.

HPC Best Practice

SFTP

If you want to use SFTP from the command line instead of with Filezilla then these commands might be helpful.

Running SFTP

sftp firstname.surname@scp.hpcwales.co.uk

SFTP commands

command Purpose
ls list files
pwd curret directory on server
!pwd current directory on local system
get copy a file from the server
put copy a file to the server
lcd change local directory
cd change remote directory

Useful Links