Introduction
Since the initial creation of the legacy documentation, there have been significant changes to the VCS data processing workflow due to the introduction of the MWAX correlator and the new beamforming software, VCSBeam. The purpose of this documentation is to provide an updated description of the processing workflow. Please note that this page is a work in progress, and any questions should be directed to Christopher Lee.
This documentation deals with data collected with the MWA's Voltage Capture System (VCS), which is described in Tremblay et al. (2015). The MWA tied-Array processing paper series provides details about calibration, beamforming, and signal processing – see Ord et al. (2015), Xue et al. (2019), McSweeney et al. (2020), and Swainston et al. (2022). Additionally, the VCSBeam documentation provides useful information about calibration and beamforming.
While not necessary, it will be useful to have a basic understanding of radio interferometry when reading this documentation, so this course may be a useful resource. Additionally, the PRESTO search tutorial provides a good introduction to the software tools available in PRESTO.
Table of Contents
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Environment setup
Loading modules
Most of our data processing is performed on Pawsey's Garrawarla cluster. Installed software on Garrawarla is managed using Lmod, which keeps the login environment tidy and reproducible. Detailed documentation is provided by Pawsey here, but the basic usage will be summarised here.
To access the custom MWA software modules, first run the following command:
Code Block | ||||
---|---|---|---|---|
| ||||
module use /pawsey/mwa/software/python3/modulefiles |
You can browse the available modules which do not conflict with your machine using
Code Block | ||||
---|---|---|---|---|
| ||||
module avail |
To load a module, run the command
Code Block | ||||
---|---|---|---|---|
| ||||
module load <module> |
For example, to use Singularity containers, you must first load the singularity
module like so:
Code Block | ||||
---|---|---|---|---|
| ||||
module load singularity |
Loading modules via a bash profile
Often one wants to have commonly used modules loaded automatically, both for convenience and to keep the environment consistent. This can be done by creating a bash profile and sourcing it when you want to load the modules. For pulsar processing, we have a standard profile which can be loaded by adding the following to your .bashrc
Code Block | ||||
---|---|---|---|---|
| ||||
alias sp3='source /pawsey/mwa/software/profiles/mwavcs_pulsar.profile' |
Running the command sp3
will load most of the standard VCS software and dependencies. The standard profile can also be used as a template when creating a custom profile.
Using common pulsar software
Pulsar software is difficult to install at the best of times, so the common packages are not currently natively installed on Garrawarla, but are provided via containerisation. There are two generic Singularity containers available to users that focus on two different aspects of pulsar science/processing.
The psr-search
container includes common pulsar searching tools including PRESTO and riptide (FFA implementation). It can be accessed as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
/pawsey/mwa/singularity/psr-search/psr-search.sif <command> |
The psr-analysis
container includes common pulsar analysis tools including DSPSR, PSRCHIVE, and various timing packages. It can be accessed as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
/pawsey/mwa/singularity/psr-analysis/psr-analysis.sif <command> |
For programs with interactivity, the containers must be run through Singularity as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
singularity run -B ~/.Xauthority <container> <command> |
To make using these containers easier, you can add the file paths as environment variables in your .bashrc
like so:
Code Block | ||||
---|---|---|---|---|
| ||||
export PSR_SEARCH_CONT=/pawsey/mwa/singularity/psr-search/psr-search.sif export PSR_ANALYSIS_CONT=/pawsey/mwa/singularity/psr-analysis/psr-analysis.sif |
For further convenience, you can set aliases for common commands, e.g.
Code Block | ||||
---|---|---|---|---|
| ||||
alias pam="$PSR_ANALYSIS_CONT pam" alias pav="singularity run -B ~/.Xauthority $PSR_ANALYSIS_CONT pav" |
Downloading data
Using the ASVO Web Application
MWA observation data are accessed via the ASVO. The simplest way to download an observation is via the ASVO Web Application. To find an observation, navigate to the Observations page and use the Search menu filters. To see all SMART observations, find the Project option in the Advanced Search menu and select project G0057.
For VCS-mode observations, you will need to specify the time offset and the duration of time to download. The following example shows the Job menu for downloading 600 seconds at the start of an observation.
For correlator-mode observations, such as calibrator observations, navigate to the Visibility Download Job tab and select the delivery location as /astro
. An example is shown below.
Once the jobs are submitted, they can be monitored via the My Jobs page, which will display the ASVO Job ID and the status of the download. The downloaded data will appear in /astro/mwavcs/asvo/<ASVO Job ID>
.
Using Giant Squid
Submitting ASVO jobs can also be done from the command line using Giant Squid. To use Giant Squid, you must first set your ASVO API key as an environment variable. You can find your API key in your Profile page on the ASVO Web Application. Then add the following to your .bashrc
Code Block | ||||
---|---|---|---|---|
| ||||
export MWA_ASVO_API_KEY=<api key> |
Currently, the voltage download mode is only available through the docker image, since a module for version 0.7.0 is yet to be created. To use the docker image, add the following alias to your .bashrc
Code Block | ||||
---|---|---|---|---|
| ||||
alias giant-squid='module load singularity; singularity exec -B $PWD docker://mwatelescope/giant-squid:latest /opt/cargo/bin/giant-squid' |
The first time that giant-squid is run, it will cache the image for subsequent runs. Alternatively, there is a version of giant-squid
located at /astro/mwavcs/software/giant-squid.sif
that anyone in the group can use. It was compiled on 1 Nov 2023. Using that pre-installed version, the above command would instead be replaced by
Code Block | ||||
---|---|---|---|---|
| ||||
alias giant-squid='module load singularity; singularity exec -B $PWD /astro/mwavcs/software/giant-squid.sif /opt/cargo/bin/giant-squid' |
VCS download jobs can then be submitted with the submit-volt
subcommand as follows:
Code Block | ||||
---|---|---|---|---|
| ||||
giant-squid submit-volt --delivery astro --offset <offset> --duration <duration> <obs ID> |
If the command is being used in a pipeline, then the --wait
option can be used to keep the program open until the observation is ready for download. Calibrator observation download jobs can be submitted using the submit-vis
subcommand:
Code Block | ||||
---|---|---|---|---|
| ||||
giant-squid submit-vis --delivery astro <obs ID> |
The standard directory structure
Due to the large data volume of VCS observations and to assist with the automation of many of the repetitive processing steps, we use a standard directory structure to store downloaded data. Data on Garrawarla are currently stored on the /astro
partition under the /astro/mwavcs
group directory. In the past, downloaded data have been stored in /astro/mwavcs/vcs/<obs ID>
, however the pooling of all users downloads into one directory can be messy and difficult to manage. Therefore, VCS downloads should be stored in the user's personal VCS directory, i.e. /astro/mwavcs/${USER}/<obs ID>
. Within this directory, the raw data (.sub
or .dat
files) should be stored in a subdirectory called combined
, while the metafits file of the VCS observation should remain in the parent directory. A second subdirectory called cal
should contain a directory for each calibration observation, within which are the visibilities and the calibration metafits. Within each calibrator's directory, the calibration solutions are stored within a subdirectory called hyperdrive
. Beamformed data are stored under a directory called pointings
, organised by source name. This is summarised below:
- /astro/mwavcs/${USER}/<obs ID>
- /combined
- Raw VCS data (.sub or .dat files)
- /cal
- <cal ID>
- Visibilities (.fits files)
- <cal ID>.metafits
- /hyperdrive
- Calibration solution
- ...
- <cal ID>
- /pointings
- <Pointing 1>
- ...
- <obs ID>.metafits
- /combined
Calibration
Finding a calibration observation
In order to form a coherent tied-array beam, the antenna delays and gains need to be calibrated. This is commonly done using a dedicated observation of a bright calibrator source such as Centaurus A or Hercules A. Calibrator observations are taken in correlator mode and stored as visibilities. To find a calibration observation, either search the MWA archive using the ASVO Web Application, or run the following command from VCSTools:
Code Block | ||||
---|---|---|---|---|
| ||||
mwa_metadb_utils.py -c <obs ID> |
This will produce a list of calibrator observations, ranked based on their distance in time to the VCS observation.
Preprocessing with Birli
Although Hyperdrive accepts the raw visibilities, it can be beneficial to downsample the time and frequency resolution to reduce the data size and improve the calibration quality. Preprocessing of FITS data can be done with Birli, which has a variety of options for flagging and averaging data (refer to the Birli help information). In the following example, Birli is used to frequency average to 40 kHz channels and time average to 2 second integrations.
Code Block | ||||
---|---|---|---|---|
| ||||
birli --metafits <METAFITS_FILE> --uvfits-out <UVFITS_FILE> --avg-freq-res <FREQ_RES> --avg-time-res <TIME_RES> |
Birli is quite resource intensive, and should be given adequate memory to run. An example Slurm script is given below, which should be executed from within the /astro/mwavcs/${USER}/<obs ID>/cal/<cal ID>/hyperdrive
directory, with the FITS files in the /astro/mwavcs/${USER}/<obs ID>/cal/<cal ID>
directory.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash -l #SBATCH --account=mwavcs #SBATCH --job-name=birli #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err #SBATCH --ntasks=36 #SBATCH --ntasks-per-node=36 #SBATCH --mem=370gb #SBATCH --partition=workq #SBATCH --time=01:00:00 #SBATCH --tmp=440G #SBATCH --export=NONE module use /pawsey/mwa/software/python3/modulefiles module load birli module list birli -V fres=40 # desired freq. resolution in kHz for cal. UVFITS tres=2 # desired time resolution in seconds for cal. UVFITS # Extract the obsid of the calibrator from the metafits file mfits=$(basename -- "$(ls ../*.metafits)") obsid="${mfits%.*}" # Make the downsampled uvfits data birli \ --metafits ../*.metafits \ --uvfits-out /nvmetmp/${obsid}_birli.uvfits \ --avg-time-res ${tres} \ --avg-freq-res ${fres} \ ../*ch???*.fits # Copy the data from the nvme to the cal directory cp /nvmetmp/${obsid}_birli*.uvfits .. |
This will produce <cal ID>_birli.uvfits
in the parent directory, which can be given to Hyperdrive to calibrate.
Calibrating with Hyperdrive
Hyperdrive is the latest generation calibration software for the MWA. It is written in Rust and uses GPU acceleration, making it several times faster than previous MWA calibration software. See the documentation for further details.
We use Hyperdrive to perform direction-independent (DI) calibration using a sky model provided by the user. The more accurately the sky model reflects the input data, the better the convergence of the calibration solution. The sky model source list can either be extracted from an all-sky catalogue such as GLEAM, or from a dedicated model for a well-characterised source such as Centaurus A. To compile a list of 1000 sources within the beam from the "standard puma" catalogue and save it as srclist_1000.yaml
, use Hyperdrives's srclist-by-beam
subcommand as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
module load srclists hyperdrive srclist-by-beam --metafits <METAFITS_FILE> --number 1000 ${SRCLISTS_DIR}/srclist_pumav3_EoR0aegean_fixedEoR1pietro+ForA_phase1+2.txt srclist_1000.yaml |
Alternatively, you can browse a list of dedicated source models here: /pawsey/mwa/software/python3/mwa-reduce/mwa-reduce-git/models
.
Calibration is performed using the Hyperdrive's di-calibrate
subcommand, as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
hyperdrive di-calibrate --source-list <SRC_LIST> --data <UVFITS_FILE> <METAFITS_FILE> |
This will produce hyperdrive_solutions.fits. To inspect the solution, use Hyperdrive's solutions-plot
subcommand to generate two PNG images (for the amplitudes and phases) as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
hyperdrive solutions-plot --metafits <METAFITS_FILE> hyperdrive_solutions.fits |
An example calibration solution is shown below.
The legend is shown in the upper right of each figure. The last (non-flagged) tile is the reference tile, unless a reference tile is selected with --ref-tile
. All other tile's solutions are divided by the reference's solution before plotting. The gains of the X and Y polarisations of each antenna, gx and gy, are plotted along with the leakage terms Dx and Dy. In the amplitudes plot, the gains should be flat across the band with a value of around 1, and the leakage terms should be around 0. In the phases plot, the gains should exhibit a linear ramp with frequency and the leakage terms should be randomly distributed. Tiles which deviate far from these behaviours should be flagged by providing a space separated list to the --tile-flags
option in di-calibrate
. Additionally, it is often necessary to flag some fine channels at the edges of each coarse channel. This can be done in di-calibrate
with the --fine-chan-flags-per-coarse-chan
option (for example, selecting channels 0 1 30 31 to flag 2x40 kHz fine channels at the edges of each coarse channel). Also note that the plot range covers the entire data range by default, which is often dominated by bad tiles. To rescale the plots you can either flag the bad tiles or use the --max-amp
and --min-amp
options in solutions-plot
.
Some examples of less-than-ideal calibrations solutions are provided here:
Expand | ||
---|---|---|
| ||
In this case, we have a solution that is almost okay, except for two aspects. One is clearly visibly if you look closely at Tiles 104-111 where the phase ramp in the first ~third of the band has periodic rippling structure (likely due to loose cabling). Unfortunately, the best course of action is to just flag those tiles outright, as it is not currently possible to flag ranges of fine/coarse channels on a per-tile basis. (One could in principle "manually" open the FITS file containing the solutions and set the corresponding bad partial tiles solutions to NaN.) The second is that in many of the tile solution plots, we see a nice phase ramp with individual points above/below the ramp in a periodic fashion. This is indicative that the edge channels have not been correctly flagged. Since our data in this case are at 40 kHz resolution, we can fix this simply by adding |
Once we are satisfied with the solution, the FITS solutions file must be converted to Offringa format in order for VCSBeam to use it. This can be done with Hyperdrive's solutions-convert
subcommand as shown below.
Code Block | ||||
---|---|---|---|---|
| ||||
hyperdrive solutions-convert --metafits <METAFITS_FILE> hyperdrive_solutions.fits hyperdrive_solutions.bin |
An example Slurm script is given below, which should be executed from within the /astro/mwavcs/${USER}/<obs ID>/cal/<cal ID>/hyperdrive
directory.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash -l #SBATCH --account=mwavcs #SBATCH --job-name=hyperdrive #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err #SBATCH --nodes=1 #SBATCH --ntasks-per-node=40 #SBATCH --partition=gpuq #SBATCH --gres=tmp:50g,gpu:1 #SBATCH --time=00:10:00 #SBATCH --export=NONE module use /pawsey/mwa/software/python3/modulefiles module load hyperdrive module load srclists module list hyperdrive -V # Get the ObsID from the metafits filename mfits=$(basename -- "$(ls ../*.metafits)") obsid="${mfits%.*}" # For brighter A-team sources, it may be better to use a specific sky model. # Browse the $srclist_base directory and select a source list, e.g. # # CenA: model-CenA-50comp_withalpha.txt # HerA: model-HerA-27comp_withalpha.txt # HydA: model-HydA-58comp_withalpha.txt # PicA: model-PicA-88comp_withalpha.txt # # If using a specific model, assign the source list to $srclist_target srclist_target= srclist_base=/pawsey/mwa/software/python3/mwa-reduce/mwa-reduce-git/models if [[ -z $srclist_target ]]; then # Create a list of 1000 sources from the standard puma catalogue srclist=srclist_1000.yaml catalogue_srclist=${SRCLISTS_DIR}/srclist_pumav3_EoR0aegean_fixedEoR1pietro+ForA_phase1+2.txt hyperdrive srclist-by-beam \ --metafits ../*.metafits \ --number 1000 \ $catalogue_srclist \ $srclist else # Use a specific source list srclist=${srclist_base}/${srclist_target} fi # Perform DI calibration # If necessary, flag tiles with --tile-flags <Tile1> <Tile2> ... <TileN>) hyperdrive di-calibrate \ --source-list $srclist \ --data ../${obsid}_birli.uvfits ../*.metafits \ --fine-chan-flags-per-coarse-chan 0 1 30 31 # Plot the solutions hyperdrive solutions-plot \ --metafits ../*.metafits \ hyperdrive_solutions.fits # Convert to Offringa format for VCSBeam hyperdrive solutions-convert \ --metafits ../*.metafits \ hyperdrive_solutions.fits \ hyperdrive_solutions.bin |
The execution time for Hyperdrive will depend on the size of the input files, the source list, and the resources allocated. Using the downsampled UVFITS as input and 1000 sources, this job runs on Garrawarla in under 2 minutes.
Beamforming
Finding a target
Compiling a list of pulsars to beamform on can be done with the find_pulsar_in_obs.py
script in vcstools. On Garrawarla, this can be loaded as follows:
Code Block | ||||
---|---|---|---|---|
| ||||
module load vcstools |
To find all of the pulsars within a given observation, the syntax is as follows:
Code Block | ||||
---|---|---|---|---|
| ||||
find_pulsar_in_obs.py -o <obs ID> |
To find all of the observations associated with a given source, you can either provide a pulsar J name or the equatorial coordinates:
Code Block | ||||
---|---|---|---|---|
| ||||
find_pulsar_in_obs.py -p <Jname> find_pulsar_in_obs.py -c <RA_DEC> |
All of the above options also accept space-separated lists of arguments. For example, given a list of pulsars and a list of observations, to find which observations contain which pulsars, run the following:
Code Block | ||||
---|---|---|---|---|
| ||||
find_pulsar_in_obs.py -o <obs ID 1> <obs ID 2> ... <obs ID N> -p <Jname 1> <Jname 2> ... <Jname N> |
Note: For MWAX VCS observations, you must include the --all_volt
option.
VCSBeam requires pointings to be specified in equatorial coordinates. To find these coordinates for a catalogued pulsar, you can use the ATNF catalogue's command line interface:
Code Block | ||||
---|---|---|---|---|
| ||||
/pawsey/mwa/singularity/psr-analysis/psr-analysis.sif psrcat -e3 -c "raj decj" <Jname> |
Tied-array beamforming with VCSBeam
VCSBeam is the successor to the legacy VCS tied-array beamformer. It is capable of processing both legacy and MWAX data into any of the following formats:
fine-channelised (10 kHz) full-Stokes time series in PSRFITS format (-p option)
- fine-channelised (10 kHz) Stokes I time series in PSRFITS format (-N option)
- coarse-channelised (1.28 MHz) complex voltage time series in VDIF format(-v option)
The coarse channelised output is made possible by the inverse PFB, which reverses the 10 kHz channelisation of the initial PFB stage. To run the tied-array beamformer, we use the make_mwa_tied_array_beam
program, which is MPI-enabled to process multiple coarse channels in a single Slurm job. Further details about this command can be found here.
An example Slurm script is given below, where 24 channels are being processed across 24 nodes. In this case, the -p
option is used to specify PSRFITS output.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash -l #SBATCH --account=mwavcs #SBATCH --job-name=beamform #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err #SBATCH --ntasks=24 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --gpus-per-task=1 #SBATCH --mem-per-cpu=32gb8g #SBATCH --partition=gpuq #SBATCH --gres=gpu:1 #SBATCH --time=01:00:00 #SBATCH --export=NONE module use /pawsey/mwa/software/python3/modulefiles module load vcsbeam module list make_mwa_tied_array_beam -V #=============================================================================== # Required inputs #------------------------------------------------------------------------------- # path to VCS metafits file metafits= # path to combined (.dat) or MWAX (.sub) data directory datadir= # path to calibration solution from hyperdrive (should be a .bin file) calsol= # path to the calibrator observation metafits calmetafits= # the starting GPS second of the observation startgps= # how many seconds to process duration= # lowest coarse channel lowchan= #=============================================================================== srun make_mwa_tied_array_beam \ -m ${metafits} \ -b ${startgps} \ -T ${duration} \ -f ${lowchan} \ -d ${datadir} \ -P ${PWD}/pointings.txt \ -F ${PWD}/flagged_tiles.txt \ -c ${calmetafits} \ -C ${calsol} \ -p -R NONE -U 0,0 -O -X --smart |
Alternatively, the beamforming can be split into multiple smaller jobs using a Slurm job array. This can save time if the Slurm queue is busy. An example is given below for channels 109 to 132, where in this case the -v
option is used to specify VDIF output.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash -l #SBATCH --account=mwavcs #SBATCH --job-name=beamform #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err #SBATCH --ntasks=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --gpus-per-task=1 #SBATCH --mem-per-cpu=32gb8g #SBATCH --partition=gpuq #SBATCH --gres=gpu:1 #SBATCH --time=01:00:00 #SBATCH --export=NONE #SBATCH --array=109-132 module use /pawsey/mwa/software/python3/modulefiles module load vcsbeam module list make_mwa_tied_array_beam -V #=============================================================================== # Required inputs #------------------------------------------------------------------------------- # path to VCS metafits file metafits= # path to combined (.dat) or MWAX (.sub) data directory datadir= # path to calibration solution from hyperdrive (should be a .bin file) calsol= # path to the calibrator observation metafits calmetafits= # the starting GPS second of the observation startgps= # how many seconds to process duration= #=============================================================================== srun make_mwa_tied_array_beam \ -m ${metafits} \ -b ${startgps} \ -T ${duration} \ -f ${SLURM_ARRAY_TASK_ID} \ -d ${datadir} \ -P ${PWD}/pointings.txt \ -F ${PWD}/flagged_tiles.txt \ -c ${calmetafits} \ -C ${calsol} \ -v -R NONE -U 0,0 -O -X --smart |