/
Tutorial: Visibility Data Reduction on Setonix

Tutorial: Visibility Data Reduction on Setonix

Starting with raw MWA data, we will preprocess, calibrate, and produce an image. For a more thorough tutorial including some quality analysis steps, check out mwa-demo. For Pawsey documentation on Setonix, check out https://pawsey.atlassian.net/wiki/spaces/US/pages/51925434/Setonix+User+Guide

For this tutorial, we will use already-provided raw data in an interactive session, but typically, data reduction at scale involves a slurm script that uses giant-squid to download raw, or calibrated visibilties from ASVO.

Interactive Session

It is recommended to use a GPU node for hyperdrive, although there are CPU-only builds available. There is currently no port of IDG for the AMD GPUs, so wsclean does not make use of GPUs, and neither does Birli.

Users in the  mwaeor , mwasci and mwavcs groups can request cpu nodes on one of the 11 nodes in the mwa partition with the salloc command . The mwa partition is just a subset of the much larger work partition with over 1300 nodes. The optimal ratio for billing purposes on these Setonix nodes is 1840MB of memory per core, for up to 128 cores. Birli will use as many cores as you can give it, and can preprocess data in chunks to fit in memory

# request 1/2 of a CPU node for an hour salloc \ --partition=mwa \ --account=${PAWSEY_PROJECT} \ --time 01:00:00 \ --mem 117760M \ --cpus-per-task 64

You may need to request a node from a highmem partition for larger workloads, but with only 8 nodes available between thousands of users, you may be waiting a while. The optimal billing ratio seems to be 7900MB per CPU

# request 1/2 of a highmem CPU for an hour salloc \ --partition=highmem \ --account=${PAWSEY_PROJECT} \ --time 01:00:00 \ --mem 505600M \ --cpus-per-task 64

Requesting GPU nodes is slightly different. Since there are 8 GPUs per node, you can only request a multiple of 1/8 of a GPU (32 cores, 58880MB of memory) node using the --gres=gpu:N flag. hyperdrive can only make use of a single GPU at the moment, so if you need more memory, you need to request more GPUs, which you won’t be able to make use of.

# request 1/8 of a GPU node for an hour salloc \ --nodes=1 \ --partition=mwa-gpu \ --account=${PAWSEY_PROJECT}-gpu \ -t 01:00:00 \ --gres=gpu:1

There are also 39 gpu-highmem nodes, with twice the memory.

Setup

Create a directory on the scratch filesystem to store the data we need for the tutorial, and change directory into it. Files on scratch are subject to a purge policy, which deletes the files https://pawsey.atlassian.net/wiki/spaces/US/pages/51925876/Pawsey+Filesystems+and+their+Use

Download our calibration sky model, storing the filename in the srclist environment variable

We will skip over obtaining raw data, but see MWA ASVO Use with HPC Systems and https://github.com/MWATelescope/mwa-demo/blob/main/demo/02_download.sh for details. This tutorial uses raw data, which allows for more flexibility in preprocessing, flagging and calibration options, but the calibrated visibilities that can be obtained by ASVO are often sufficient for most science cases. We’ll process http://ws.mwatelescope.org/observation/obs/?obsid=1121334536

Preprocessing

Birli is the MWA preprocessor. It takes raw files, does RFI flagging, instrument corrections and format conversion, resulting in a preprocessed visibility file in the uvfits format (measurement set is supported too). Birli is available on Setonix via module load birli/default. You should specify the --max-memory argument (in gigabytes, with a safety factor) if your observation doesn’t fit in memory, but chunking too much will have consequences for flagging performance.

Calibration

hyperdrive is the MWA calibration suite. it has excellent documentation . When Pawsey installed hyperdrive, for some reason they made the CPU-only version the default: module load hyperdrive/default . It is recommended to use the gpu version with module load hyperdrive-amd-gfx90a/default . Much like the rest of the software on Setonix, things might move around, so check out module avail hyperdrive for the latest available modules.

The first step in direction-independent calibration is to produce calibration solutions.

After this point, the remaining hyperdrive steps (solutions-plot and solutions-apply) do not use the GPU, so if you want to optimize your billing, it’s best to do these steps on a CPU node.

At this point, you would typically inspect the calibration solutions for quality issues with hyperdrive solutions-plot , other quality analysis tools are explored in https://github.com/MWATelescope/mwa-demo/blob/1b044e185e0594027253ec9aa9a7f13c0a35df78/demo/06_cal.sh#L106 . Once you are happy with the solutions, you can apply them to the data. In this case, we will produce a measurement set, as this is required by wsclean. However, writing measurement sets to scratch is not recommended due to deficiencies in the format itself.

imaging

you can then view the image with https://pawsey.atlassian.net/wiki/spaces/US/pages/91095109/CARTA (without needing to download the file locally)

wsclean_1121334536-image-pb.fits-image-2024-11-05-16-29-41-20241105-082941.png

Other stuff

There are plenty more modules you can use, check out mwa-demo.

  • module load giant-squid/default can download mwa visibilities (raw or preprocessed, calibrated)

  • module load mwalib/default is a library for reading raw MWA visibilities