Tutorial: Visibility Data Reduction on Setonix
Starting with raw MWA data, we will preprocess, calibrate, and produce an image. For a more thorough tutorial including some quality analysis steps, check out mwa-demo. For Pawsey documentation on Setonix, check out Setonix User Guide
For this tutorial, we will use already-provided raw data in an interactive session, but typically, data reduction at scale involves a slurm script that uses giant-squid to download raw, or calibrated visibilties from ASVO.
Interactive Session
It is recommended to use a GPU node for hyperdrive, although there are CPU-only builds available. There is currently no port of IDG for the AMD GPUs, so wsclean does not make use of GPUs, and neither does Birli.
Users in the mwaeor
, mwasci
and mwavcs
groups can request cpu nodes on one of the 11 nodes in the mwa
partition with the salloc command . The mwa
partition is just a subset of the much larger work
partition with over 1300 nodes. The optimal ratio for billing purposes on these Setonix nodes is 1840MB of memory per core, for up to 128 cores. Birli will use as many cores as you can give it, and can preprocess data in chunks to fit in memory
# request 1/2 of a CPU node for an hour
salloc \
--partition=mwa \
--account=${PAWSEY_PROJECT} \
--time 01:00:00 \
--mem 117760M \
--cpus-per-task 64
You may need to request a node from a highmem
partition for larger workloads, but with only 8 nodes available between thousands of users, you may be waiting a while. The optimal billing ratio seems to be 7900MB per CPU. A policy is in place that restricts users to two highmem
nodes at a time.
# request 1/2 of a highmem CPU for an hour
salloc \
--partition=highmem \
--account=${PAWSEY_PROJECT} \
--time 01:00:00 \
--mem 505600M \
--cpus-per-task 64
Requesting GPU nodes is slightly different. Since there are 8 GPUs per node, you can only request a multiple of 1/8 of a GPU (32 cores, 58880MB of memory) node using the --gres=gpu:N
flag. hyperdrive can only make use of a single GPU at the moment, so if you need more memory, you need to request more GPUs, which you won’t be able to make use of.
# request 1/8 of a GPU node for an hour
salloc \
--nodes=1 \
--partition=mwa-gpu \
--account=${PAWSEY_PROJECT}-gpu \
-t 01:00:00 \
--gres=gpu:1
There are also 39 gpu-highmem nodes, with twice the memory.
Setup
Create a directory on the scratch filesystem to store the data we need for the tutorial, and change directory into it. Files on scratch are subject to a purge policy, which deletes the files Pawsey Filesystems and their Use
Download our calibration sky model, storing the filename in the srclist
environment variable
We will skip over obtaining raw data, but see MWA ASVO Use with HPC Systems and mwa-demo/demo/02_download.sh at main · MWATelescope/mwa-demo for details. This tutorial uses raw data, which allows for more flexibility in preprocessing, flagging and calibration options, but the calibrated visibilities that can be obtained by ASVO are often sufficient for most science cases. We’ll process MWA Observation 1121334536 - D0006:CenA_145
Preprocessing
Birli is the MWA preprocessor. It takes raw files, does RFI flagging, instrument corrections and format conversion, resulting in a preprocessed visibility file in the uvfits format (measurement set is supported too). Birli is available on Setonix via module load birli/default
. You should specify the --max-memory
argument (in gigabytes, with a safety factor) if your observation doesn’t fit in memory, but chunking too much will have consequences for flagging performance.
Calibration
hyperdrive
is the MWA calibration suite. it has excellent documentation . When Pawsey installed hyperdrive, for some reason they made the CPU-only version the default: module load hyperdrive/default
. It is recommended to use the gpu version with module load hyperdrive-amd-gfx90a/default
. Much like the rest of the software on Setonix, things might move around, so check out module avail hyperdrive
for the latest available modules.
The first step in direction-independent calibration is to produce calibration solutions.
After this point, the remaining hyperdrive steps (solutions-plot
and solutions-apply
) do not use the GPU, so if you want to optimize your billing, it’s best to do these steps on a CPU node.
At this point, you would typically inspect the calibration solutions for quality issues with hyperdrive solutions-plot
, other quality analysis tools are explored in mwa-demo/demo/06_cal.sh at 1b044e185e0594027253ec9aa9a7f13c0a35df78 · MWATelescope/mwa-demo . Once you are happy with the solutions, you can apply them to the data. In this case, we will produce a measurement set, as this is required by wsclean. However, writing measurement sets to scratch is not recommended due to deficiencies in the format itself.
imaging
you can then view the image with CARTA (without needing to download the file locally)
Other stuff
There are plenty more modules you can use, check out mwa-demo.
module load giant-squid/default
can download mwa visibilities (raw or preprocessed, calibrated)module load mwalib/default
is a library for reading raw MWA visibilities