Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Currently still in heavy development, but is able to perform direction-independent calibration on a GPU or CPU.

More documentation: https://mwatelescope.github.comio/MWATelescope/mwa_hyperdrive/wikiindex.html

Project homepage: https://github.com/MWATelescope/mwa_hyperdrive

...

Code Block
languagebash
themeMidnight
titleExample module avail output
collapsetrue
---------------------------------- /pawsey/mwa/software/python3/modulefiles ----------------------------------
hyperdrive/chj hyperdrive/v0.2.0-alpha1alpha11 (L,D)


Load a hyperdrive module:

Code Block
languagebash
themeMidnight
module load hyperdrive # this will load the default version

hyperdrive prefers to use the FEE beam when its applicable.  The associated beam code (hyperbeam) requires that the MWA FEE beam file be available at runtime; this is either done manually with a command-line argument to hyperdrive, or with the MWA_BEAM_FILE environment variable.  garrawarla users typically don't need to worry about this, because hyperdrive modules automatically set MWA_BEAM_FILE.

How do I get started?

Have a look at the help text!

The following is current as of 21 February 2022.

See help text:

Code Block
languagebash
themeMidnight
hyperdrive -h # -h could also be --help
Code Block
languagetext
themeMidnight
titleExample help output
collapsetrue
hyperdrive 0.2.0-alpha9
https://github.com/MWATelescope/mwa_hyperdrive
Calibration software for the Murchison Widefield Array (MWA) radio telescope

USAGE:
    hyperdrive <SUBCOMMAND>

OPTIONS:
    -h, --help       Print help information
    -V, --version    Print version information

SUBCOMMANDS:
    di-calibrate         Perform direction-independent calibration on the input MWA data. See for more
                         info: https://github.com/MWATelescope/mwa_hyperdrive/wiki/Calibration-usage
    simulate-vis         Simulate visibilities of a sky-model source list
    solutions-convert    Convert between calibration solution file formats
    solutions-plot       Plot calibration solutions
    srclist-by-beam      Reduce a sky-model source list to the top N brightest sources, given pointing
                         information
    srclist-convert      Convert a sky-model source list from one format to another
    srclist-shift        Shift the sources in a source list. Useful to correct for the ionosphere. The
                         shifts must be detailed in a .json file, with source names as keys associated with
                         an "ra" and "dec" in degrees. Only the sources specified in the .json are written
                         to the output source list
    srclist-verify       Verify that sky-model source lists can be read by hyperdrive
    dipole-gains         Print information on the dipole gains listed by a metafits file

hyperdrive is broken up into many subcommands. Each of these have their own help; e.g.

Code Block
languagetext
themeMidnight
titleExample help output for hyperdrive di-calibrate
collapsetrue
hyperdrive-di-calibrate 0.2.0-alpha9
Perform direction-independent calibration on the input MWA data. See for more info:
https://github.com/MWATelescope/mwa_hyperdrive/wiki/Calibration-usage

USAGE:
    hyperdrive di-calibrate [OPTIONS] [--] [ARGUMENTS_FILE]

ARGS:
    <ARGUMENTS_FILE>    All of the arguments to di-calibrate may be specified in a toml or json file. Any CLI arguments
                        override parameters set in the file

OPTIONS:
    -v, --verbosity    The verbosity of the program. Increase by specifying multiple times (e.g. -vv). The default is to print
                       only high-level information
        --dry-run      Don't actually do calibration; just verify that arguments were correctly ingested and print out high-
                       level information
    -h, --help         Print help information
    -V, --version      Print version information

INPUT FILES:
    -d, --data <DATA>...                         Paths to input data files to be calibrated. These can include a metafits file,
                                                 gpubox files, mwaf files, a measurement set and/or uvfits files
    -s, --source-list <SOURCE_LIST>              Path to the sky-model source list file
        --source-list-type <SOURCE_LIST_TYPE>    The type of sky-model source list. Valid types are: hyperdrive, rts, woden,
                                                 ao. If not specified, all types are attempted

OUTPUT FILES:
    -o, --outputs <OUTPUTS>...
            Paths to the calibration output files. Supported calibrated visibility outputs: uvfits. Supported calibration
            solution formats: fits, bin. Default: hyperdrive_solutions.bin

    -m, --model-filename <MODEL_FILENAME>
            The path to the file where the generated sky-model visibilities are written. If this argument isn't supplied, then
            no file is written. Supported formats: uvfits

        --ignore-autos
            When writing out calibrated visibilities, don't include auto-correlations

        --output-vis-time-average <OUTPUT_VIS_TIME_AVERAGE>
            When writing out calibrated visibilities, average this many timesteps together. Also supports a target time
            resolution (e.g. 8s). The value must be a multiple of the input data's time resolution. The default is to preserve
            the input data's time resolution. e.g. If the input data is in 0.5s resolution and this variable is 4, then we
            average 2s worth of calibrated data together before writing the data out. If the variable is instead 4s, then 8
            calibrated timesteps are averaged together before writing the data out

        --output-vis-freq-average <OUTPUT_VIS_FREQ_AVERAGE>
            When writing out calibrated visibilities, average this many fine freq. channels together. Also supports a target
            freq. resolution (e.g. 80kHz). The value must be a multiple of the input data's freq. resolution. The default is to
            preserve the input data's freq. resolution. e.g. If the input data is in 40kHz resolution and this variable is 4,
            then we average 160kHz worth of calibrated data together before writing the data out. If the variable is instead
            80kHz, then 2 calibrated fine freq. channels are averaged together before writing the data out

SKY-MODEL SOURCES:
    -n, --num-sources <NUM_SOURCES>
            The number of sources to use in the source list. The default is to use them all. Example: If 1000 sources are
            specified here, then the top 1000 sources are used (based on their flux densities after the beam attenuation)
            within the specified source distance cutoff

        --source-dist-cutoff <SOURCE_DIST_CUTOFF>
            Specifies the maximum distance from the phase centre a source can be [degrees]. Default: 50

        --veto-threshold <VETO_THRESHOLD>
            Specifies the minimum Stokes XX+YY a source must have before it gets vetoed [Jy]. Default: 0.01

BEAM:
        --beam-file <BEAM_FILE>    The path to the HDF5 MWA FEE beam file. If not specified, this must be provided by the
                                   MWA_BEAM_FILE environment variable
        --unity-dipole-gains       Pretend that all MWA dipoles are alive and well, ignoring whatever is in the metafits file
        --delays <DELAYS>...       If specified, use these dipole delays for the MWA pointing
        --no-beam                  Don't apply a beam response when generating a sky model. The default is to use the FEE beam

CALIBRATION:
    -t, --time-average-factor <TIME_AVERAGE_FACTOR>
            The number of time samples to average together during calibration. Also supports a target time resolution (e.g.
            8s). If this is 0, then all data are averaged together. Default: 0. e.g. If this variable is 4, then we produce
            calibration solutions in timeblocks with up to 4 timesteps each. If the variable is instead 4s, then each timeblock
            contains up to 4s worth of data

    -f, --freq-average-factor <FREQ_AVERAGE_FACTOR>
            The number of fine-frequency channels to average together before calibration. If this is 0, then all data is
            averaged together. Default: 1. e.g. If the input data is in 20kHz resolution and this variable was 2, then we
            average 40kHz worth of data into a chanblock before calibration. If the variable is instead 40kHz, then each
            chanblock contains upto 40kHz worth of data

        --timesteps <TIMESTEPS>...
            The timesteps to use from the input data. The timesteps will be ascendingly sorted for calibration. No duplicates
            are allowed. The default is to use all unflagged timesteps

        --uvw-min <UVW_MIN>
            The minimum UVW length to use. This value must have a unit annotated. Allowed units: λ, kλ, l, kl, lambda, klambda,
            m, km. Default: 50λ

        --uvw-max <UVW_MAX>
            The maximum UVW length to use. This value must have a unit annotated. Allowed units: λ, kλ, l, kl, lambda, klambda,
            m, km. No default.

        --max-iterations <MAX_ITERATIONS>
            The maximum number of times to iterate when performing "MitchCal". Default: 50

        --stop-thresh <STOP_THRESH>
            The threshold at which we stop iterating when performing "MitchCal". Default: 1e-8

        --min-thresh <MIN_THRESH>
            The minimum threshold to satisfy convergence when performing "MitchCal". Even when this threshold is exceeded,
            iteration will continue until max iterations or the stop threshold is reached. Default: 1e-4

        --array_longitude <ARRAY_LONGITUDE_DEG>
            The Earth longitude of the instrumental array [degrees]. Default (MWA): 116.67081523611111°

        --array_latitude <ARRAY_LATITUDE_DEG>
            The Earth latitude of the instrumental array [degrees]. Default (MWA): -26.703319405555554°

        --cpu
            Use the CPU for visibility generation. This is deliberately made non-default because using a GPU is much faster

FLAGGING:
        --tile-flags <TILE_FLAGS>...
            Additional tiles to be flagged. These values correspond to either the values in the "Antenna" column of HDU 2 in
            the metafits file (e.g. 0 3 127), or the "TileName" (e.g. Tile011)

        --ignore-input-data-tile-flags
            If specified, pretend that all tiles are unflagged in the input data

        --ignore-input-data-fine-channel-flags
            If specified, pretend all fine channels in the input data are unflagged

        --fine-chan-flags-per-coarse-chan <FINE_CHAN_FLAGS_PER_COARSE_CHAN>...
            The fine channels to be flagged in each coarse channel. e.g. 0 1 16 30 31 are typical for 40 kHz data. If this is
            not specified, it defaults to flagging 80 kHz (or as close to this as possible) at the edges, as well as the centre
            channel for non-MWAX data

        --fine-chan-flags <FINE_CHAN_FLAGS>...
            The fine channels to be flagged across the whole observation band. e.g. 0 767 are the first and last fine channels
            for 40 kHz data

RAW MWA DATA:
        --pfb-flavour <PFB_FLAVOUR>     The 'flavour' of poly-phase filter bank corrections applied to raw MWA data. The
                                        default is 'empirical'. Valid flavours are: empirical, levine, none
        --no-digital-gains              When reading in raw MWA data, don't apply digital gains
        --no-cable-length-correction    When reading in raw MWA data, don't apply cable length corrections. Note that some data
                                        may have already had the correction applied before it was written
        --no-geometric-correction       When reading in raw MWA data, don't apply geometric corrections. Note that some data
                                        may have already had the correction applied before it was written

USER INTERFACE:
        --no-progress-bars    When reading in visibilities and generating sky-model visibilities, don't draw progress bars

DI calibration

Available with hyperdrive di-calibrate

Two main things are required to calibrate visibilities:

  • Raw data (gpubox files or MWAX ch??? files) or data container (measurement set or uvfits); and
  • A sky-model source list.

Discussion on the source lists and the applicable formats can be found here.

By default, hyperdrive will attempt to use all sources in the source list file.  If there are more than 1,000 sources in the file, then it may take a long time if you're not using a GPU.  In order to keep the number of sources used low, one could use the -n/--num-sources and/or --veto-threshold flags, or use a source list with fewer sources in the first place (see hyperdrive srclist-by-beam).

...


module load hyperdrive/chj # load CHJ's development version


Example Slurm script

Code Block
languagebash
themeMidnight
#!/bin/bash -l
#SBATCH --job-name=hyp-$1
#SBATCH --output=hyperdrive.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --time=01:00:00
#SBATCH --clusters=garrawarla
#SBATCH --partition=gpuq
#SBATCH --account=mwaeor
#SBATCH --export=NONE
#SBATCH --gres=gpu:1,tmp:50g
#SBATCH
--gres=gpu:1

module use /pawsey/mwa/software/python3/modulefiles
module load hyperdrive

set -eux
whichcommand -v hyperdrive

cd /astro/mwaeor/MWA/data/1090008640

# MakeGet acalibration sourcesolutions. listUse ifthe ittop isn't already there
if [[ ! -r srclist_1000.yaml ]]; then
   hyperdrive srclist-by-beam -n 1000 -m *.metafits 1000 sources.
hyperdrive di-calibrate \
              -s /pawsey/mwa/software/python3/srclists/master/srclist_pumav3_EoR0aegean_fixedEoR1pietro+ForA_phase1+2.txt \
              srclist_1000.yaml
fi

hyperdrive di-calibrate -s srclist_1000.yaml-n 1000 \
    -d *gpubox*.fits *.metafits

Writing out calibrated visibilities

hyperdrive can write out calibrated visibilities, but only what was read in for calibration.  This means that any omitted timesteps are also omitted in the output.  Soon, a solutions-apply subcommand will allow any solutions file to be applied to any input data.

The output calibrated visibilities can also be averaged in time and frequency (by multiples of the input resolution or to a target quantity).

Code Block
languagebash
themeMidnight
titleWriting out calibrated visibilities
# Write solutions to "hyp_sols.fits" and calibrated vis to "hyp_cal.uvfits"
hyperdrive di-calibrate -s srclist_1000.yaml \
						-d *gpubox*.fits *.metafits \
						 *.mwaf \
    -o hyp_sols.fits
hyp_cal.uvfits
\
						--output-vis-time-average 2 \
						--output-vis-freq-average 80kHz

Plotting calibration solutions

Any DI solutions files that are compatible with hyperdrive (André's output from calibrate and RTS) can be plotted directly with hyperdrive.  If using a supercomputer, there's no need to run the job in the queue; it's fast enough to just run it on the login node. It's also good to plot with the corresponding metafits file to get more information:

Code Block
languagebash
themeMidnight
hyperdrive solutions-plot -m *.metafits hyp_sols.fits

If you want to do more analysis with Python, this code reads and plots the hyperdrive format:

Code Block
languagepy
themeMidnight
titlePython plotting code
collapsetrue
#!/usr/bin/env python

import sys
import numpy as np
from astropy.io import fits
import matplotlib.pyplot as plt

if len(sys.argv) == 1:
    filename = "hyp_sols.fits"
else:
    filename = sys.argv[1]

f = fits.open(filename)
data = f[1].data
# Only looking at the first timeblock.
i_timeblock = 0
data = data[i_timeblock, :, :, ::2] + data[i_timeblock, :, :, 1::2] * 1j

# Uncomment if you want to divide by a reference.
# i_tile_ref = -1
# refs = []
# for ref in data[i_tile_ref].reshape((-1, 2, 2)):
#     refs.append(np.linalg.inv(ref))
# refs = np.array(refs)
# j_div_ref = []
# for tile_j in data:
#     for (j, ref) in zip(tile_j, refs):
#         j_div_ref.append(j.reshape((2, 2)).dot(ref))
# data = np.array(j_div_ref).reshape(data.shape)

# Amps
amps = np.abs(data)

_, ax = plt.subplots(8, 16, sharex=True, sharey=True)
# Uncomment if you want to manually set the y-limit
# ax[0, 0].set_ylim(0, 2)
for i in range(128):
    ax[i // 16, i % 16].plot(amps[i, :, 0].flatten())  # XX
    ax[i // 16, i % 16].plot(amps[i, :, 3].flatten())  # YY
plt.show()

# Phases
phases = np.rad2deg(np.angle(data))

_, ax = plt.subplots(8, 16, sharex=True, sharey=True)
ax[0, 0].set_ylim(-180, 180)
for i in range(128):
    ax[i // 16, i % 16].plot(phases[i, :, 0].flatten())  # XX
    ax[i // 16, i % 16].plot(phases[i, :, 3].flatten())  # YY
plt.show()

Planned features

As hyperdrive is still in heavy development, not all features are currently available.  An indication of what is available is below.

  •  Reads raw MWA data
  •  Reads a single uvfits file as input
  •  Reads multiple uvfits files as input
  •  Reads a single measurement set file as input
  •  Reads multiple measurement set files as input
  •  Calibrates on the CPU
  •  Calibrates on a GPU
  •  Writes calibration solutions to the "André Offringa calibrate format"
  •  Writes calibration solutions in the "RTS format"
  •  Writes calibrated visibilities directly to uvfits output
  •  Writes calibrated visibilities directly to measurement set output
# Apply the solutions and write out a measurement set.
# Write it to /nvmetmp as that's much faster than /astro.
hyperdrive solutions-apply \
    -d *gpubox*.fits *.metafits *.mwaf \
    -s hyp_sols.fits \
    -o /nvmetmp/hyp_calibrated.ms \
    --time-average 8s \
    --freq-average 80kHz

# Move the measurement set to /astro.
mv /nvmetmp/hyp_calibrated.ms .

This example script reserves 50 GB of space for node local storage (/nvmetmp). If your output visibilities are bigger than this, then the write will fail; you should adjust the #SBATCH --gres=gpu:1,tmp:50g line to account for this, e.g. #SBATCH --gres=gpu:1,tmp:200g