Running Parallel Jobs

Shared memory

Shared memory job runs multiple processes which share memory together on one machine. User should write a script to request for running a job on a single Compute node with the following maximum number of threads on each machine:

  • 16 on the partition chalawan_cpu

  • 24 on the node pollux1

  • 28 on the nodes pollux2 and pollux3

It is also recommended for a program is written with OpenMP directive and C/C++ multi-threading. An example script is displayed here

#!/bin/bash

#SBATCH -J shared     # Job name
#SBATCH -N 1          # Total number of nodes requested
#SBATCH -n 16         # Total number of mpi tasks
#SBATCH -t 120:00:00  # Run time (hh:mm:ss)

mpirun -np 16 -ppn 1 [ options ] <program> [ <args> ]

Distributed memory

For distributed memory, each process has its own memory and does not share with any others. A distributed memory job can run across multiple Compute nodes. It requires a program that is written with the specific parallel directive, e.g. the Message Passing Interface (MPI). Moreover, it requires an additional set up to scatter the processes over Compute nodes. Suppose we want to run a job with 16 processes which spawn 4 processes on each compute node, we may write:

#!/bin/bash

#SBATCH -J distributed # Job name
#SBATCH -N 4           # Total number of nodes requested
#SBATCH -n 16          # Total number of mpi tasks
#SBATCH --ntasks-per-node=4 # Total number of tasks per one node
#SBATCH -t 120:00:00   # Run time (hh:mm:ss)

mpirun -np 16 -ppn 4 [ options ] <program> [ <args> ]