Introduction

Since the initial creation of the legacy documentation, there have been significant changes to the VCS data processing workflow due to the introduction of the MWAX correlator and the new beamforming software, VCSBeam. The purpose of this documentation is to provide an updated description of the processing workflow. Please note that this page is a work in progress, and any questions should be directed direct any questions or suggestions to Christopher Lee or Bradley Meyers.

This documentation deals with data collected with the MWA's Voltage Capture System (VCS), which is described in Tremblay et al. (2015). The MWA tied-Array processing paper series provides details about calibration, beamforming, and signal processing – see Ord et al. (2015), Xue et al. (2019), McSweeney et al. (2020), and Swainston et al. (2022). Additionally, the VCSBeam documentation provides useful information about some theory and examples regarding calibration and beamforming.

While not necessary, it will be useful to have a basic understanding of radio interferometry when reading this documentation, so this course may be a useful resource. Additionally, the PRESTO search tutorial provides a good introduction to the software tools available in PRESTO.

Table of Contents

Table of Contents

outline	true
indent	30px
style	none

Environment setup

Loading modules

Most of our

Software Usage on Setonix

VCS data processing is currently performed primarily on Pawsey's Garrawarla cluster. Installed software on Garrawarla is managed using Lmod, which keeps the login environment tidy and reproducible. Detailed documentation is provided by Pawsey here, but the basic usage will be summarised here.To access the custom MWA software modules, first run the following commandSetonix supercomputer. We refer you to the Setonix User Guide for the complete details on the software stack and how to run jobs. Relevant details are provided here.

Modules

Members of the mwavcs Pawsey group will have access to the common software used to calibrate and reduce VCS data. These can be browsed with:

Code Block

language	bash

module avail

To load a module, the user must specify the module name and version like so:

Code Block

language	bash
theme	Midnight

module useload /pawsey/mwa/software/python3/modulefiles

You can browse the available modules which do not conflict with your machine using

Code Block

language	bash
theme	Midnight

module avail

To load a module, run the command

Code Block

language	text
theme	Midnight

module load <module>

For example, to use Singularity containers, you must first load the singularity module like so:

Code Block

language	bash
theme	Midnight

module load singularity

Loading modules via a bash profile

Often one wants to have commonly used modules loaded automatically, both for convenience and to keep the environment consistent. This can be done by creating a bash profile and sourcing it when you want to load the modules. For pulsar processing, we have a standard profile which can be loaded by adding the following to your .bashrc

<module>/<version>

Using Common Pulsar Software

On Setonix, the common pulsar software packages are provided as singularity images (`.sif` files), which can be used interactively or used to execute routines on the commandline. There are two ways in which these software containers can be used: directly (as was previously done on Garrawarla), or via the module system - we'll describe both below.

There are three containers, all of which are located in `/software/projects/mwavcs/setonix/2024.05/containers/`. Specifically:

psr-timing: contains the standard pulsar timing software packages, tempo, tempo2 and PINT (along with updated clock file from when the image was produced).
psr-search: contains the most commonly used searching software packages, PRESTO and riptide
psr-analysis: contains the most commonly used analysis tools and utilities, PSRCHIVE, DSPSR, PSRSALSA, clfd and PulsePortraiture

If there are missing packages in any of the above containers, please let someone know via Slack or email so that we can re-build and include what you need. (Although be prepared to answer any compilation/installation questions that may arise!)

You can see the Dockerfiles used here: https://github.com/CIRA-Pulsars-and-Transients-Group/pulsar-software-containers/tree/main

Directly using the containers

First, we need to load the Singularity module,

Loading the Singularity module

Code Block

language	bash
theme	Midnight

alias sp3='source /pawsey/mwamodule use /software/profiles/mwavcs_pulsar.profile'

Running the command sp3 will load most of the standard VCS software and dependencies. The standard profile can also be used as a template when creating a custom profile.

Using common pulsar software

Pulsar software is difficult to install at the best of times, so the common packages are not currently natively installed on Garrawarla, but are provided via containerisation. There are two generic Singularity containers available to users that focus on two different aspects of pulsar science/processing.

The psr-search container includes common pulsar searching tools including PRESTO and riptide (FFA implementation). It can be accessed as shown below.

Code Block

language	text
theme	Midnight

/pawsey/mwa/singularity/psr-search/psr-search.sif <command>

The psr-analysis container includes common pulsar analysis tools including DSPSR, PSRCHIVE, and various timing packages. It can be accessed as shown below.

Code Block

language	text
theme	Midnight

/pawsey/mwa/singularity/psr-analysis/psr-analysis.sif <command>

For programs with interactivity, the containers must be run through Singularity as shown below.

Code Block

language	text
theme	Midnight

singularity run -B ~/.Xauthority <container> <command>

To make using these containers easier, you can add the file paths as environment variables in your .bashrc like so:

Code Block

language	bash
theme	Midnight

export PSR_SEARCH_CONT=/pawsey/mwa/singularity/psr-search/psr-search.sif
export PSR_ANALYSIS_CONT=/pawsey/mwa/singularity/psr-analysis/psr-analysis.sif

For further convenience, you can set aliases for common commands, e.g.

Code Block

language	bash
theme	Midnight

alias pam="$PSR_ANALYSIS_CONT pam"
alias pav="singularity run -B ~/.Xauthority $PSR_ANALYSIS_CONT pav"

Downloading data

Using the ASVO Web Application

MWA observation data are accessed via the ASVO. The simplest way to download an observation is via the ASVO Web Application. To find an observation, navigate to the Observations page and use the Search menu filters. To see all SMART observations, find the Project option in the Advanced Search menu and select project G0057.

For VCS-mode observations, you will need to specify the time offset and the duration of time to download. The following example shows the Job menu for downloading 600 seconds at the start of an observation.

Image Removed

For correlator-mode observations, such as calibrator observations, navigate to the Visibility Download Job tab and select the delivery location as /astro. An example is shown below.

Image Removed

Once the jobs are submitted, they can be monitored via the My Jobs page, which will display the ASVO Job ID and the status of the download. The downloaded data will appear in /astro/mwavcs/asvo/<ASVO Job ID>.

Using Giant Squid

Submitting ASVO jobs can also be done from the command line using Giant Squid. To use Giant Squid, you must first set your ASVO API key as an environment variable. You can find your API key in your Profile page on the ASVO Web Application. Then add the following to your .bashrc

Code Block

language	bash
theme	Midnight

export MWA_ASVO_API_KEY=<api key>

Currently, the voltage download mode is only available through the docker image, since a module for version 0.7.0 is yet to be created. To use the docker image, add the following alias to your .bashrc

Code Block

language	bash
theme	Midnight

alias giant-squid='module load singularity; singularity exec -B $PWD docker://mwatelescope/giant-squid:latest /opt/cargo/bin/giant-squid'

The first time that giant-squid is run, it will cache the image for subsequent runs. Alternatively, there is a version of giant-squid located at /astro/mwavcs/software/giant-squid.sif that anyone in the group can use. It was compiled on 1 Nov 2023. Using that pre-installed version, the above command would instead be replaced by

Code Block

language	bash
theme	Midnight

alias giant-squid='module load singularity; singularity exec -B $PWD /astro/mwavcs/software/giant-squid.sif /opt/cargo/bin/giant-squid'

VCS download jobs can then be submitted with the submit-volt subcommand as follows:

Code Block

language	bash
theme	Midnight

giant-squid submit-volt --delivery astro --offset <offset> --duration <duration> <obs ID>

If the command is being used in a pipeline, then the --wait option can be used to keep the program open until the observation is ready for download. Calibrator observation download jobs can be submitted using the submit-vis subcommand:

Code Block

language	bash
theme	Midnight

giant-squid submit-vis --delivery astro <obs ID>

The standard directory structure

Due to the large data volume of VCS observations and to assist with the automation of many of the repetitive processing steps, we use a standard directory structure to store downloaded data. Data on Garrawarla are currently stored on the /astro partition under the /astro/mwavcs group directory. In the past, downloaded data have been stored in /astro/mwavcs/vcs/<obs ID>, however the pooling of all users downloads into one directory can be messy and difficult to manage. Therefore, VCS downloads should be stored in the user's personal VCS directory, i.e. /astro/mwavcs/${USER}/<obs ID>. Within this directory, the raw data (.sub or .dat files) should be stored in a subdirectory called combined, while the metafits file of the VCS observation should remain in the parent directory. A second subdirectory called cal should contain a directory for each calibration observation, within which are the visibilities and the calibration metafits. Within each calibrator's directory, the calibration solutions are stored within a subdirectory called hyperdrive. Beamformed data are stored under a directory called pointings, organised by source name. This is summarised below:

/astro/mwavcs/${USER}/<obs ID>
- /combined
  - Raw VCS data (.sub or .dat files)
- /cal
  - <cal ID>
    - Visibilities (.fits files)
    - <cal ID>.metafits
    - /hyperdrive
      - Calibration solution
  - ...
- /pointings
  - <Pointing 1>
  - ...
- <obs ID>.metafits

Calibration

Finding a calibration observation

In order to form a coherent tied-array beam, the antenna delays and gains need to be calibrated. This is commonly done using a dedicated observation of a bright calibrator source such as Centaurus A or Hercules A. Calibrator observations are taken in correlator mode and stored as visibilities. To find a calibration observation, either search the MWA archive using the ASVO Web Application, or run the following command from VCSTools:

Code Block

language	text
theme	Midnight

mwa_metadb_utils.py -c <obs ID>

This will produce a list of calibrator observations, ranked based on their distance in time to the VCS observation.

Preprocessing with Birli

Although Hyperdrive accepts the raw visibilities, it can be beneficial to downsample the time and frequency resolution to reduce the data size and improve the calibration quality. Preprocessing of FITS data can be done with Birli, which has a variety of options for flagging and averaging data (refer to the Birli help information). In the following example, Birli is used to frequency average to 40 kHz channels and time average to 2 second integrations.

Code Block

language	text
theme	Midnight

birli --metafits <METAFITS_FILE> --uvfits-out <UVFITS_FILE> --avg-freq-res <FREQ_RES> --avg-time-res <TIME_RES>

Birli is quite resource intensive, and should be given adequate memory to run. An example Slurm script is given below, which should be executed from within the /astro/mwavcs/${USER}/<obs ID>/cal/<cal ID>/hyperdrive directory, with the FITS files in the /astro/mwavcs/${USER}/<obs ID>/cal/<cal ID> directory.

true

Code Block
language	bash
theme	Midnight
title	birli.batch
collapse	projects/mwavcs/setonix/2024.05/modules/zen3/gcc/12.2.0 module load singularity/4.1.0-mwavcs

You can then execute a command via the container of you choice, like

Execute a command via a Singularity container

Code Block

language	bash

singularity exec /software/projects/mwavcs/setonix/2024.05/containers/<container>.sif <command>

and, if you need an interactive X-Window session to appear, then you will need to add -B ~/.Xauthority after the exec command and before the path to the container.

Using the containers via modules

We can also make the singularity containers "look" like modules, with all the software available on your regular command line (rather than needing to execute a Singularity pre-command). Nominally, if you follow the below steps, you should be able to see the three containers installed as system modules for our group.

Using Singularity containers as modules

Code Block

language	bash

module use /software/projects/mwavcs/setonix/2024.05/modules/zen3/gcc/12.2.0
module av psr-

which should then show something like
Image Added

and then these modules can be loaded as per usual. For example, if you load the psr-analysis/24-11-07 module, you will then have access to various post-processing packages from the command line. e.g.,
Image Added

These modules can be used as normal within job submissions via SLURM as well.

Using Nextflow

Nextflow is a workflow management software which is described here. It is extremely useful for constructing data processing pipelines, including our own mwa_search pipeline used for the SMART survey.

On Setonix

Nextflow can be loaded with:

Code Block

language	bash

module load nextflow

Manual Installation

Nextflow requires Java to run. Installation instructions can be found here.

Following Nextflow's installation docs, start by downloading Nextflow with curl:

Code Block

language	bash

curl -s https://get.nextflow.io | bash

Then make Nextflow executable with chmod:

Code Block

language	bash

chmod +x nextflow

Move nextflow into a directory in your PATH in order to execute it anywhere.

Run nextflow self-update to upgrade to the latest version. This will update the Nextflow executable in-place. To see the available subcommands, run nextflow help.

Typical Usage

For local pipelines, the basic syntax is:

Code Block

language	bash

nextflow run pipeline.nf [PIPELINE OPTIONS]

Where pipeline.nf contains an entry workflow (i.e. a workflow with no name).

For pipelines hosted on GitHub and configured for pipeline sharing, you can simply run the pipeline with:

Code Block

language	bash

nextflow run <username>/<repository> [-r <branch>] [PIPELINE OPTIONS]

where -r <branch> allows you to optionally specify a Git branch to use.

If the pipeline receives updates on GitHub, you can update your locally cached version of the pipeline with:

Code Block

language	bash

nextflow pull [-r <branch>] <username>/<repository>

If the pipeline name is unique among your locally cached pipelines, you can omit the GitHub username on execution.

As a simple example, a typical workflow could look like:

Code Block

language	bash

nextflow pull cplee1/vcs_recombine
nextflow run vcs_recombine [PIPELINE OPTIONS]

In general, you should execute Nextflow within a screen session so that it can run separate from your terminal session. On Setonix, pipelines must be run on a dedicated workflow login node. A typical workflow would be:

Code Block

language	bash

# Create a session
screen -S pipeline
# Start the pipeline
nextflow run pipeline.nf
# Detach from the session with Ctrl+A then D
# View all of your running screen sessions
screen -ls
# Reattach to your session
screen -r pipeline

Once the pipeline has completed and you no longer need access to the intermediate files, you should delete the work directory to save disk space. The location of the work directory will differ depending on the pipeline, but by default will be placed in the working directory in which the pipeline was launched.

Downloading Data

Using the ASVO Web Application

MWA observation data are accessed via the ASVO. The simplest way to download an observation is via the ASVO Web Application. To find an observation, navigate to the Observations page and use the Search menu filters. To browse all SMART observations, find the Project option in the Advanced Search menu and select project G0057.

For VCS-mode observations, you will need to specify the time offset from the start of the observation and the duration of time to download. The following example shows the Job menu for downloading 600 seconds at the start of an observation.

Image Added

For correlator-mode observations, such as calibrator observations, navigate to the Visibility Download Job tab and select the delivery location as /scratch. An example is shown below.

Image Added

Once the jobs are submitted, they can be monitored via the My Jobs page, which will display the ASVO Job ID and the status of the download. The downloaded data will appear in /scratch/mwavcs/asvo/<ASVO Job ID>.

Using Giant Squid

Submitting ASVO jobs can also be done from the command line using Giant Squid. To use Giant Squid, you must first set your ASVO API key as an environment variable. You can find your API key in your Profile page on the ASVO Web Application. Then add the following to your .bashrc

Code Block

language	bash

export MWA_ASVO_API_KEY=<api key>

On Setonix, Giant Squid can be loaded with:

Code Block

language	bash

module load giant-squid/<version>

VCS download jobs can be submitted with the submit-volt subcommand:

Code Block

language	bash

giant-squid submit-volt --delivery scratch --offset <offset> --duration <duration> <obs ID>

Visibility observation download jobs (e.g., for calibration) can be submitted using the submit-vis subcommand:

Code Block

language	bash

giant-squid submit-vis --delivery scratch <obs ID>

To monitor the status of your ASVO jobs, you can use the list subcommand:

Code Block

language	bash

giant-squid list

Data Directory Structure

On Setonix, we store VCS data on the scratch partition in user directories. Typically the file structure is the following:

${MYSCRATCH}/<obs ID>
- /raw
  - Raw VCS data, legacy only (.dat files)
- /combined
  - Combined VCS data (.sub or .dat files)
- /cal
  - <cal ID>
    - Visibilities (GPU Box or MWAX correlator .fits files)
    - <cal ID>.metafits
    - /hyperdrive
      - Calibration solution (.fits and .bin file)
      - Calibration solution diagnostic plots (phases and amplitudes)
  - ...
- /pointings
  - <Pointing 1>
    - PSRFITS search-mode output (.fits), and/or
    - VDIF voltage beam output (.vdif and .hdr)
  - ...
- <obs ID>.metafits

Unpacking recombined data

In some cases, Legacy VCS observation have already been recombined and re-uploaded to the MWA archive. In this case, you will have downloaded one .tar files and one .ics file per second recorded, in addition to the usual metadata. An efficient way to extract the voltage channel files in parallel is to use the following script.

Expand

title	untar_vcs_recombined.batch

Code Block
#!/bin/bash -l

#SBATCH --account=mwavcs
#SBATCH --

job-name=birli

partition=mwa
#SBATCH --job-name=untar-vcs-recombined
#SBATCH --output=%x-%j.out
#SBATCH --

error=%x-%j.err

time=8:00:00
#SBATCH --ntasks=

36

64
#SBATCH --ntasks-per-node=

36

64
#SBATCH --

mem=370gb #SBATCH --partition=workq

cpus-per-task=1
#SBATCH --

time=01:00:00 #SBATCH --tmp=440G #SBATCH --export=NONE module use /pawsey/mwa/software/python3/modulefiles module load birli module list birli -V fres=40 # desired freq. resolution in kHz for cal. UVFITS tres=2 # desired time resolution in seconds for cal. UVFITS # Extract the obsid of the calibrator from the metafits file mfits=$(basename -- "$(ls ../*.metafits)") obsid="${mfits%.*}" # Make the downsampled uvfits data birli \ --metafits ../*.metafits \ --uvfits-out /nvmetmp/${obsid}_birli.uvfits \ --avg-time-res ${tres} \ --avg-freq-res ${fres} \ ../*ch???*.fits # Copy the data from the nvme to the cal directory cp /nvmetmp/${obsid}_birli*.uvfits ..

This will produce <cal ID>_birli.uvfits in the parent directory, which can be given to Hyperdrive to calibrate.

Calibrating with Hyperdrive

Hyperdrive is the latest generation calibration software for the MWA. It is written in Rust and uses GPU acceleration, making it several times faster than previous MWA calibration software. See the documentation for further details.

We use Hyperdrive to perform direction-independent (DI) calibration using a sky model provided by the user. The more accurately the sky model reflects the input data, the better the convergence of the calibration solution. The sky model source list can either be extracted from an all-sky catalogue such as GLEAM, or from a dedicated model for a well-characterised source such as Centaurus A. To compile a list of 1000 sources within the beam from the "standard puma" catalogue and save it as srclist_1000.yaml, use Hyperdrives's srclist-by-beam subcommand as shown below.

Code Block

language	bash
theme	Midnight

module load srclists
hyperdrive srclist-by-beam --metafits <METAFITS_FILE> --number 1000 ${SRCLISTS_DIR}/srclist_pumav3_EoR0aegean_fixedEoR1pietro+ForA_phase1+2.txt srclist_1000.yaml

Alternatively, you can browse a list of dedicated source models here: /pawsey/mwa/software/python3/mwa-reduce/mwa-reduce-git/models.

Calibration is performed using the Hyperdrive's di-calibrate subcommand, as shown below.

Code Block

language	text
theme	Midnight

hyperdrive di-calibrate --source-list <SRC_LIST> --data <UVFITS_FILE> <METAFITS_FILE>

This will produce hyperdrive_solutions.fits. To inspect the solution, use Hyperdrive's solutions-plot subcommand to generate two PNG images (for the amplitudes and phases) as shown below.

Code Block

language	bash
theme	Midnight

hyperdrive solutions-plot --metafits <METAFITS_FILE> hyperdrive_solutions.fits

An example calibration solution is shown below.

Image Removed

The legend is shown in the upper right of each figure. The last (non-flagged) tile is the reference tile, unless a reference tile is selected with --ref-tile. All other tile's solutions are divided by the reference's solution before plotting. The gains of the X and Y polarisations of each antenna, gx and gy, are plotted along with the leakage terms Dx and Dy. In the amplitudes plot, the gains should be flat across the band with a value of around 1, and the leakage terms should be around 0. In the phases plot, the gains should exhibit a linear ramp with frequency and the leakage terms should be randomly distributed. Tiles which deviate far from these behaviours should be flagged by providing a space separated list to the --tile-flags option in di-calibrate. Additionally, it is often necessary to flag some fine channels at the edges of each coarse channel. This can be done in di-calibrate with the --fine-chan-flags-per-coarse-chan option (for example, selecting channels 0 1 30 31 to flag 2x40 kHz fine channels at the edges of each coarse channel). Also note that the plot range covers the entire data range by default, which is often dominated by bad tiles. To rescale the plots you can either flag the bad tiles or use the --max-amp and --min-amp options in solutions-plot.

Some examples of less-than-ideal calibrations solutions are provided here:

Expand

title	Example of bad edge channels & reciever

In this case, we have a solution that is almost okay, except for two aspects.

Image Removed

One is clearly visibly if you look closely at Tiles 104-111 where the phase ramp in the first ~third of the band has periodic rippling structure (likely due to loose cabling). Unfortunately, the best course of action is to just flag those tiles outright, as it is not currently possible to flag ranges of fine/coarse channels on a per-tile basis. (One could in principle "manually" open the FITS file containing the solutions and set the corresponding bad partial tiles solutions to NaN.)

The second is that in many of the tile solution plots, we see a nice phase ramp with individual points above/below the ramp in a periodic fashion. This is indicative that the edge channels have not been correctly flagged. Since our data in this case are at 40 kHz resolution, we can fix this simply by adding --fine-chan-flags-per-coarse-chan 0 1 16 30 31 (flagging the edge 80 kHz and central 40 kHz), as mentioned above.

Once we are satisfied with the solution, the FITS solutions file must be converted to Offringa format in order for VCSBeam to use it. This can be done with Hyperdrive's solutions-convert subcommand as shown below.

Code Block

language	bash
theme	Midnight

hyperdrive solutions-convert --metafits <METAFITS_FILE> hyperdrive_solutions.fits hyperdrive_solutions.bin

An example Slurm script is given below, which should be executed from within the /astro/mwavcs/${USER}/<obs ID>/cal/<cal ID>/hyperdrive directory.

Code Block

language	bash
theme	Midnight
title	hyperdrive.batch
collapse	true

#!/bin/bash -l

#SBATCH --account=mwavcs
#SBATCH --job-name=hyperdrive
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --partition=gpuq
#SBATCH --gres=tmp:50g,gpu:1
#SBATCH --time=00:10:00
#SBATCH --export=NONE

module use /pawsey/mwa/software/python3/modulefiles
module load hyperdrive
module load srclists
module list
hyperdrive -V

# Get the ObsID from the metafits filename
mfits=$(basename -- "$(ls ../*.metafits)")
obsid="${mfits%.*}"

# For brighter A-team sources, it may be better to use a specific sky model.
# Browse the $srclist_base directory and select a source list, e.g.
#
#   CenA: model-CenA-50comp_withalpha.txt
#   HerA: model-HerA-27comp_withalpha.txt
#   HydA: model-HydA-58comp_withalpha.txt
#   PicA: model-PicA-88comp_withalpha.txt
#
# If using a specific model, assign the source list to $srclist_target
srclist_target=
srclist_base=/pawsey/mwa/software/python3/mwa-reduce/mwa-reduce-git/models

if [[ -z $srclist_target ]]; then
    # Create a list of 1000 sources from the standard puma catalogue
    srclist=srclist_1000.yaml
    catalogue_srclist=${SRCLISTS_DIR}/srclist_pumav3_EoR0aegean_fixedEoR1pietro+ForA_phase1+2.txt

    hyperdrive srclist-by-beam \
        --metafits ../*.metafits \
        --number 1000 \
        $catalogue_srclist \
        $srclist
else
    # Use a specific source list
    srclist=${srclist_base}/${srclist_target}
fi

# Perform DI calibration
# If necessary, flag tiles with --tile-flags <Tile1> <Tile2> ... <TileN>)
hyperdrive di-calibrate \
    --source-list $srclist \
    --data ../${obsid}_birli.uvfits ../*.metafits \
    --fine-chan-flags-per-coarse-chan 0 1 30 31

# Plot the solutions
hyperdrive solutions-plot \
    --metafits ../*.metafits \
    hyperdrive_solutions.fits

# Convert to Offringa format for VCSBeam
hyperdrive solutions-convert \
    --metafits ../*.metafits \
    hyperdrive_solutions.fits \
    hyperdrive_solutions.bin

The execution time for Hyperdrive will depend on the size of the input files, the source list, and the resources allocated. Using the downsampled UVFITS as input and 1000 sources, this job runs on Garrawarla in under 2 minutes.

Beamforming

Finding a target

Compiling a list of pulsars to beamform on can be done with the find_pulsar_in_obs.py script in vcstools. On Garrawarla, this can be loaded as follows:

Code Block

language	bash
theme	Midnight

module load vcstools

To find all of the pulsars within a given observation, the syntax is as follows:

Code Block

language	bash
theme	Midnight

find_pulsar_in_obs.py -o <obs ID>

To find all of the observations associated with a given source, you can either provide a pulsar J name or the equatorial coordinates:

Code Block

language	bash
theme	Midnight

find_pulsar_in_obs.py -p <Jname>
find_pulsar_in_obs.py -c <RA_DEC>

All of the above options also accept space-separated lists of arguments. For example, given a list of pulsars and a list of observations, to find which observations contain which pulsars, run the following:

Code Block

language	bash
theme	Midnight

find_pulsar_in_obs.py -o <obs ID 1> <obs ID 2> ... <obs ID N> -p <Jname 1> <Jname 2> ... <Jname N>

Note: For MWAX VCS observations, you must include the --all_volt option.

VCSBeam requires pointings to be specified in equatorial coordinates. To find these coordinates for a catalogued pulsar, you can use the ATNF catalogue's command line interface:

Code Block

language	bash
theme	Midnight

/pawsey/mwa/singularity/psr-analysis/psr-analysis.sif psrcat -e3 -c "raj decj" <Jname>

Tied-array beamforming with VCSBeam

VCSBeam is the successor to the legacy VCS tied-array beamformer. It is capable of processing both legacy and MWAX data into any of the following formats:

fine-channelised (10 kHz) full-Stokes time series in PSRFITS format (-p option)
fine-channelised (10 kHz) Stokes I time series in PSRFITS format (-N option)
coarse-channelised (1.28 MHz) complex voltage time series in VDIF format(-v option)

The coarse channelised output is made possible by the inverse PFB, which reverses the 10 kHz channelisation of the initial PFB stage. To run the tied-array beamformer, we use the make_mwa_tied_array_beam program, which is MPI-enabled to process multiple coarse channels in a single Slurm job. Further details about this command can be found here.

An example Slurm script is given below, where 24 channels are being processed across 24 nodes. In this case, the -p option is used to specify PSRFITS output.

Code Block

language	bash
theme	Midnight
title	beamform.batch
collapse	true

#!/bin/bash -l

#SBATCH --account=mwavcs 
#SBATCH --job-name=beamform
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
#SBATCH --ntasks=24
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --gpus-per-task=1
#SBATCH --mem=32gb
#SBATCH --partition=gpuq
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH --export=NONE
 
module use /pawsey/mwa/software/python3/modulefiles
module load vcsbeam
module list
make_mwa_tied_array_beam -V

#===============================================================================
# Required inputs
#-------------------------------------------------------------------------------
# path to VCS metafits file
metafits=
# path to combined (.dat) or MWAX (.sub) data directory
datadir=
# path to calibration solution from hyperdrive (should be a .bin file)
calsol=
# path to the calibrator observation metafits
calmetafits=
# the starting GPS second of the observation
startgps=
# how many seconds to process
duration=
# lowest coarse channel
lowchan=
#===============================================================================

srun make_mwa_tied_array_beam \
    -m ${metafits} \
    -b ${startgps} \
    -T ${duration} \
    -f ${lowchan} \
    -d ${datadir} \
    -P ${PWD}/pointings.txt \
    -F ${PWD}/flagged_tiles.txt \
    -c ${calmetafits} \
    -C ${calsol} \
    -p -R NONE -U 0,0 -O -X --smart

Alternatively, the beamforming can be split into multiple smaller jobs using a Slurm job array. This can save time if the Slurm queue is busy. An example is given below for channels 109 to 132, where in this case the -v option is used to specify VDIF output.

Code Block

language	bash
theme	Midnight
title	beamform_array.batch
collapse	true

#!/bin/bash -l #SBATCH --account=mwavcs #SBATCH --job-name=beamform #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err #SBATCH --ntasks=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --gpus-per-task=1 #SBATCH --mem=32gb #SBATCH --partition=gpuq #SBATCH --gres=gpu:1 #SBATCH --time=01:00:00 #SBATCH --export=NONE #SBATCH --array=109-132 module use /pawsey/mwa/software/python3/modulefiles module load vcsbeam module list

mem=109Gb

# where the tarballs are located (and where the files will be extracted to)
download_dir=/scratch/mwavcs/asvo/833965
cd $download_dir

# collect the files list
filelist=(ls *_combined.tar)

# how many parallel jobs to run
n_proc=${SLURM_NTASKS}

#number of input files
n_input_files=${#filelist[@]}

#number of batches
n_batch=$(((n_input_files + n_proc - 1) / n_proc))

for i in seq 1 ${n_batch}; do
    echo "batch $i extracting 128 tarballs"
    start_id=$((n_proc * (i - 1)))
    for archive in ${filelist[@]:start_id:n_proc}; do
        echo "tar -v --touch --extract -f ${archive} &"
        tar -v --touch --extract -f ${archive} &
    done
    wait
done
wait

cd -

The amount of memory requested is definitely overkill, but is close to the nominal memory allocated based on the number of CPUs. Providing this option just makes things more explicit and easier to schedule in the queue, usually.

Calibration

Finding a calibration observation

In order to form a coherent tied-array beam, the antenna delays and gains need to be calibrated. This is commonly done using a dedicated observation of a bright calibrator source such as Centaurus A or Hercules A. Calibrator observations are taken in correlator mode and stored as visibilities. To find a calibration observation, either search the MWA archive using the ASVO Web Application, or run the following command from VCSTools:

Code Block

language	bash

mwa_metadb_utils.py -c <obs ID>

This will produce a list of calibrator observations, ranked based on their distance in time to the VCS observation.

VCSTools is not currently installed on Setonix. (As of January 2025)

Preprocessing with Birli

Although Hyperdrive accepts the raw visibilities, it is beneficial to downsample the time and frequency resolution to reduce the data size and improve the calibration quality. Preprocessing of raw visibilities can be done with Birli, which has a variety of options for flagging and averaging data (refer to the Birli help information). In general, 2 sec / 40 kHz resolution is a good starting point. Birli can be loaded on Setonix with:

Code Block

language	bash

module load birli/<version>

The following demonstrates the typical usage:

Code Block

language	bash

birli --metafits <METAFITS_FILE> --uvfits-out <UVFITS_FILE> --avg-freq-res <FREQ_RES> --avg-time-res <TIME_RES>

In general, it is also a good idea to specify the maximum allowed RAM for Birli to use with the --max-memory option, which accepts a number corresponding to how many GB of RAM to allocate. Otherwise, Birli will attempt to load the entire raw visibility data set, which often requires >100 GB of memory.

Info
On Setonix, specifying the maximum allowed memory is particularly important as jobs will be terminated prematurely if they try to access more memory that allocated based on the number of CPUs in the resource request. There is approximately 1.79 GB RAM per CPU core allocated on the `work`/`mwa` partitions.

Example Birli job on Setonix

This script would be submitted from the ${MYSCRATCH}/<obs ID>/cal/<cal ID>/hyperdrive directory.

Expand

title	Example Birli pre-processing submission script

Code Block

language	bash

#!/bin/bash -l
#SBATCH --account=mwavcs
#SBATCH --partition=mwa
#SBATCH --job-name=vcs-calibrate-birli
#SBATCH --output=%x-%j.out
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=57500M
#SBATCH --time=01:00:00

module load birli/0.15.1
birli -V

# OpenMP settings
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# MWA primary beam file
export MWA_BEAM_FILE=/scratch/references/mwa/beam-models/mwa_full_embedded_element_pattern.h5

# Get the ObsID from the metafits filename
mfits=$(basename -- "$(ls ../*.metafits)")
obsid="${mfits%.*}"

fres=40 # desired freq. resolution in kHz for cal. UVFITS
tres=2  # desired time resolution in seconds for cal. UVFITS

# Make the downsampled uvfits data if it's not already there
if [[ ! -r ../${obsid}_birli.uvfits ]]; then
    shopt -s nullglob
    srun -N ${SLURM_JOB_NUM_NODES} -n ${SLURM_NTASKS} -c ${OMP_NUM_THREADS} -m block:block:block \
        birli \
        --max-memory $(bc -l <<< "0.9*${SLURM_MEM_PER_NODE}/1000") \
        -m ../*.metafits \
        -u ../${obsid}_birli.uvfits \
        --avg-time-res ${tres} \
        --avg-freq-res ${fres} \
        ../*gpubox*.fits ../*ch???*.fits
    shopt -u nullglob
else
    echo "../${obsid}_birli.uvfits already exists! Move it or delete it. Exiting here."
fi

Calibrating with Hyperdrive

Hyperdrive is the latest generation calibration software for the MWA. It is written in Rust and uses GPU acceleration, making it several times faster than previous MWA calibration software. See the documentation for further details. Hyperdrive can be loaded on Setonix with:

Code Block

language	bash

module load hyperdrive/<version>

We use Hyperdrive to perform direction-independent (DI) calibration using a sky model provided by the user. The more accurately the sky model reflects the input data, the better the convergence of the calibration solution. The sky model source list can be extracted from an all-sky catalogue such as GLEAM, or one can use a dedicated model for a well-characterised source such as Centaurus A. To compile a list of 1000 sources within the beam from the GLEAM Galactic Sky Model and save it as srclist_1000.yaml, use Hyperdrives's srclist-by-beam subcommand as shown below:

Code Block

language	bash

hyperdrive srclist-by-beam --metafits <METAFITS_FILE> --number 1000 /software/projects/mwavcs/GGSM_updated_2023Oct23.txt srclist_1000.yaml

Calibration is performed using the Hyperdrive's di-calibrate subcommand, as shown below.

Code Block

language	bash

hyperdrive di-calibrate --source-list <SRC_LIST> --data <UVFITS_FILE> <METAFITS_FILE> --fine-chan-flags-per-coarse-chan <CHAN_FLAGS>

Info
Note for 40 kHz frequency channels, we typically flag channels `0 1 16 30 31` to zero-weight the channels at the edges and centre of the bandpass response.

This will produce hyperdrive_solutions.fits. To inspect the solution, use Hyperdrive's solutions-plot subcommand to generate two PNG images (for the amplitudes and phases) as shown below.

Code Block

language	bash

hyperdrive solutions-plot --metafits <METAFITS_FILE> hyperdrive_solutions.fits

An example calibration solution is shown below. The legend is shown in the upper right of each figure. The last (non-flagged) tile is the reference tile, unless a reference tile is selected with --ref-tile. All other tile's solutions are divided by the reference's solution before plotting. The gains of the X and Y polarisations of each antenna, gx and gy, are plotted along with the leakage terms Dx and Dy. In the amplitudes plot, the gains should be flat across the band with a value of around 1, and the leakage terms should be around 0. In the phases plot, the gains should exhibit a linear ramp with frequency and the leakage terms should be randomly distributed. Tiles which deviate far from these behaviours should be flagged by providing a space separated list to the --tile-flags option in di-calibrate. Also note that the plot range covers the entire data range by default, which is often dominated by bad tiles. To rescale the plots you can either flag the bad tiles or use the --max-amp and --min-amp options in solutions-plot.

Expand

title	Example of good amplitude and phase solutions

Image Added

Some examples of less-than-ideal calibrations solutions are provided here:

Expand

title	Example of bad edge channels & reciever

In this case, we have a solution that is almost okay, except for two aspects.

Image Added

One is clearly visible if you look closely at Tiles 104-111 where the phase ramp in the first ~third of the band has periodic rippling structure (likely due to loose cabling). Unfortunately, the best course of action is to just flag those tiles outright, as it is not currently possible to flag ranges of fine/coarse channels on a per-tile basis. (One could in principle "manually" open the FITS file containing the solutions and set the corresponding bad partial tiles solutions to NaN.)

The second is that in many of the tile solution plots, we see a nice phase ramp with individual points above/below the ramp in a periodic fashion. This is indicative that the edge channels have not been correctly flagged. Since our data in this case are at 40 kHz resolution, we can fix this simply by adding --fine-chan-flags-per-coarse-chan 0 1 16 30 31 (flagging the edge 80 kHz and central 40 kHz), as mentioned above.

Once we are satisfied with the solution, the FITS solutions file must be converted to Offringa format in order for VCSBeam to use it. This can be done with Hyperdrive's solutions-convert subcommand as shown below.

Code Block

language	bash

hyperdrive solutions-convert --metafits <METAFITS_FILE> hyperdrive_solutions.fits hyperdrive_solutions.bin

Example Hyperdrive job on Setonix

The following script will run Hyperdrive as described above. The script must be executed from the ${MYSCRATCH}/<obs ID>/cal/<cal ID>/hyperdrive directory.

Expand

title	Example Hyperdrive calibration submission script

Code Block

language	bash

#!/bin/bash -l
#SBATCH --account=mwavcs-gpu
#SBATCH --partition=mwa-gpu
#SBATCH --job-name=vcs-calibrate-hyperdrive
#SBATCH --output=%x-%j.out
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00

# Load the GPU-accelerated version of Hyperdrive 
# (You can use the CPU-only version by just loading hyperdrive/0.4.1 - but, why would you?)
module load hyperdrive-amd-gfx90a/0.4.1
hyperdrive -V

# OpenMP settings
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# MWA primary beam file
export MWA_BEAM_FILE=/scratch/references/mwa/beam-models/mwa_full_embedded_element_pattern.h5

# Get the ObsID from the metafits filename
mfits=$(basename -- "$(ls ../*.metafits)")
obsid="${mfits%.*}"

# Create a list of 1000 sources from a catalogue
srclist=srclist_1000.yaml
hyperdrive srclist-by-beam \
    -n 1000 \
    -m ../*.metafits \
    /software/projects/mwavcs/GGSM_updated_2023Oct23.txt \
    $srclist

# Perform DI calibration
# If necessary, flag tiles with --tile-flags <Tile1> <Tile2> ... <TileN>)
hyperdrive di-calibrate \
    --max-iterations 1000 \
    --source-list $srclist \
    --data ../${obsid}_birli.uvfits ../*.metafits \
    --fine-chan-flags-per-coarse-chan 0 1 16 30 31

# Plot the solutions
hyperdrive solutions-plot \
    --metafits ../*.metafits \
    hyperdrive_solutions.fits

# Convert to Offringa format for VCSBeam
hyperdrive solutions-convert \
    --metafits ../*.metafits \
    hyperdrive_solutions.fits \
    hyperdrive_solutions.bin

Info
On a Setonix GPU node, requesting a GPU actually reserves an "allocation-pack" - see https://pawsey.atlassian.net/wiki/spaces/US/pages/51928618/Setonix+GPU+Partition+Quick+Start

Note

This means that for each GPU requested we have 8 CPU cores and ~25 GB RAM. In the case of running Hyperdrive, this memory needs to be able to store the entire UVFITS output from Birli.

If the UVFITS is larger than ~25 GB then you will need to use 2 “allocation-packs” worth of resources. This can be achieved by changing --gres=gpu:1 to --gres=gpu:2 in the above example script.

Recombining Legacy VCS Data

If you are using Setonix, there is a Nextflow pipeline that is available to make recombining data simple. To use it, run:

Code Block

language	bash

nextflow run cplee1/vcs_recombine --obsid <OBSID> --download_dir <DOWNLOAD_DIR> [--vcs_dir <VCS_DIR>]

The pipeline will look in the directory specified with the --download_dir option and determine what data is available. The combined data will be placed in <VCS_DIR>/<OBSID>/combined. The metafits file (with the name <OBSID>_metafits_ppds.fits) should exist in either the download directory or the <VCS_DIR>/<OBSID> directory. The default for --vcs_dir is /scratch/mwavcs/$USER.

For example:

Code Block

language	bash

nextflow run cplee1/vcs_recombine --obsid 1261241272 --download_dir /scratch/mwavcs/asvo/123456

In this case, the pipeline would recombine the files in /scratch/mwavcs/asvo/123456 and place them in /scratch/mwavcs/$USER/1261241272/combined.

By default, the pipeline will submit one job per 32 seconds of data, so a full SMART observation it will be split into 150 jobs. Each job should take ~10-20 min to complete.

The pipeline does not delete the raw data. Therefore, to save disk space, it is recommended that you manually delete the raw data as soon as the pipeline has successfully completed.

Note
This workflow does NOT untar data files, so you should only run this once you’re confident the legacy data have been extracted into the `--download_dir` location.

Beamforming

Finding a target

Pulsars can be located in the MWA primary beam using the mwa-source-finder, a pip-installable Python package with a command line interface. Python environments on Setonix are managed at the user level, so this software should be installed into a Python virtual environment. For example:

Code Block

language	bash

module load python/3.11.6
module load hyperbeam/0.9.3
module load py-numpy/1.26.1
python -m venv /path/to/myenv
source /path/to/myenv/bin/activate
python -m pip install git+https://github.com/cplee1/mwa_source_finder.git

Example 1: Finding all pulsars in an observation

Code Block

language	bash

source-finder -o 1221399680

To be specific, the software defines "in the beam" as the zenith-normalised beam power towards the source exceeding the a specified minimum (by default 0.3, i.e. 30%). By default it evaluates the primary beam model at the centre frequency of the observation, but this can be changed with --freq_mode {low/centre/high/multi}.

Example 2: Finding all observations for a pulsar/source

Code Block

language	bash

source-finder -s J2241-5236 --obs_for_source [--filter_available]

The --filter_available option will exclude observations with no data files available. The source(s) can be specified as pulsar B/J names, or RA/Dec coordinates in sexigesimal or decimal form. The following are all acceptable: "B1451-68" "J1456-6843" "14:55:59.92_-68:43:39.50" "223.999679_-68.727639".

Example 3: A more complex query to find pulsars in an observation

Code Block

language	bash

source-finder -o 1221399680 --start 0.1 --end 0.5 --min_power 0.5 --condition 'P0<0.03 && DM<100' --time_plot

The above example will search for pulsars matching the given PSRCAT query that exceed a zenith-normalised beam power of 0.5 between the fractional observation times 0.1 and 0.5. It will also make a plot of the source power traces over time.

Pulsar coordinates can be obtained from the ATNF Pulsar Catalogue web portal. The coordinates should be written in a text file (RA and Dec in sexigesimal format separated by a space, one set of coordinates per line) to give to VCSBeam.

Tied-array beamforming with VCSBeam

VCSBeam is the successor to the legacy VCS tied-array beamformer. It is capable of processing both legacy and MWAX data into any of the following formats:

fine-channelised (10 kHz) full-Stokes time series in PSRFITS format (-p option)
fine-channelised (10 kHz) Stokes I time series in PSRFITS format (-N option)
coarse-channelised (1.28 MHz) complex voltage time series in VDIF format (-v option)

The coarse channelised output is made possible by the inverse PFB, which reverses the 10 kHz channelisation of the initial PFB stage. To run the tied-array beamformer, we use the make_mwa_tied_array_beam program, which is MPI-enabled to process multiple coarse channels in a single Slurm job. Further details about this command can be found here.

An example Slurm script is given below, where 24 channels are being processed across 24 nodes. In this case, the -p option is used to specify PSRFITS output.

beamform.sbatch

Code Block

language	bash

#!/bin/bash -l

#SBATCH --account=mwavcs-gpu
#SBATCH --partition=mwa-gpu
#SBATCH --job-name=beamform
#SBATCH --output=%x-%j.out
#SBATCH --ntasks=24
#SBATCH --gpus-per-task=1
#SBATCH --time=01:00:00
 
module load vcsbeam/v5.2

make_mwa_tied_array_beam -V

#===============================================================================
# Required inputs
#-------------------------------------------------------------------------------
# path to VCS metafits file
metafits=
# path to combined (.dat) or MWAX (.sub) data directory
datadir=
# path to calibration solution from hyperdrive (should be a .bin file)
calsol=
# path to the calibrator observation metafits
calmetafits=
# the starting GPS second of the observation
startgps=
# how many seconds to process
duration=
# lowest coarse channel index
lowchan=
# path to pointings file
pointings=
#===============================================================================

srun make_mwa_tied_array_beam \
    -m ${metafits} \
    -b ${startgps} \
    -T ${duration} \
    -f ${SLURM_ARRAY_TASK_ID} \
    -d ${datadirlowchan} \
    -Pd ${PWDdatadir}/pointings.txt \
    -FP ${PWD}/flagged_tiles.txtpointings} \
    -c ${calmetafits} \
    -C ${calsol} \
    -vp -R NONE -U 0,0 -O -X --smart-s

For VDIF output, the beamforming can be split into multiple smaller jobs using a Slurm job array. This is done with the sbatch option --array=LOWCHAN-HIGHCHAN. The -f option in make_mwa_tied_array_beam should then be given $SLURM_ARRAY_TASK_ID.

Version	Old Version 24	New Version Current
Changes made by	Christopher Lee	Christopher Lee
Saved on	Nov 22, 2023	Feb 14, 2025

Content Comparison

Versions Compared

Key

Environment setup

Loading modules

Software Usage on Setonix

Modules

Loading modules via a bash profile

Using Common Pulsar Software

Directly using the containers

Loading the Singularity module

Using common pulsar software

Downloading data

Using the ASVO Web Application

Using Giant Squid

The standard directory structure

Calibration

Finding a calibration observation

Preprocessing with Birli

Execute a command via a Singularity container

Using the containers via modules

Using Singularity containers as modules

Using Nextflow

On Setonix

Manual Installation

Typical Usage

Downloading Data

Using the ASVO Web Application

Using Giant Squid

Data Directory Structure

Unpacking recombined data

Calibrating with Hyperdrive

Beamforming

Finding a target

Tied-array beamforming with VCSBeam

Calibration

Finding a calibration observation

Preprocessing with Birli

Example Birli job on Setonix

Calibrating with Hyperdrive

Example Hyperdrive job on Setonix

Recombining Legacy VCS Data

Beamforming

Finding a target

Tied-array beamforming with VCSBeam

beamform.sbatch