Recombining means converting legacy VCS data from one format to another. The two formats are both headerless binary formats, with each byte representing a (4+4)-bit complex signed sample. The only difference between the formats is the ordering of the bytes and how they are distributed between multiple files. For the purposes of this document, the "from" format will be called the PFB or unrecombined format, and the "to" format will be called the VCS or recombined format. The format conversion is necessary because the beamformer only (currently) supports the VCS format as input.
Software
The primary software for recombining is found in the mwa-voltage repository. This has recently been forked to its own dedicated recombine repo in order to promote further development/maintenance, but as of this writing (2022-06-01) is identical (in functionality and usage) to mwa-voltage. In either case, the name of the exectuable is recombine
. On Garrawarla, it is provided by the mwa-voltage
and recombine
modules. Future developments will be made available through the recombine
module, but the mwa-voltage
module will always remain available for compatibility with historical pipelines.
Usage
(See mwa-voltage and recombine for the most up-to-date usage documentation)
recombine -o <obsid> -t <secondid> -m <meta-data fits> -i <output dir> -c <skip course chan> -s <skip ICS> -f <file list> or -g <input file list> <obsid>: observation id of the data being processed. <secondid>: the second which is being processed <meta-data fits>: meta-data fits file containing tile flag information and various orther useful information regarding the observation. To obtain the meta-data fits file for a particular observation use the following: wget -O <obsid>.metafits http://ws.mwatelescope.org/metadata/fits?obs_id=<obsid> <output dir>: output product directory <skip course chan>: 1 will skip the generation of the recombined course channel data <skip ICS>: 1 will skip the generation of the incoherent sum <input file list>: location of 32 raw uncombined input files for a single seconds worth of data (separate each with a space) <input file list>: a file containing the location of the 32 raw uncombined input files (separate by newline)
Example for processing one second of data on Garrawarla
#!/bin/bash -l #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=370gb #SBATCH --time=00:10:00 #SBATCH --partition=workq #SBATCH --account=mwavcs #SBATCH --job-name=recombine_test #SBATCH --output=recombine_test.out module use /pawsey/mwa/software/python3/modulefiles module load recombine srun -n 1 -N 1 recombine \ -o 1267111608 \ -t 1267111610 \ -m /astro/mwavcs/vcs/1267111608/1267111608_metafits_ppds.fits \ -i . \ -s 1 \ -f /astro/mwavcs/vcs/1267111608/raw/1267111608_1267111610_vcs*.dat
Other scripts
vcs_download.nf
vcs_download.nf
is a Nextflow script provided by the mwa_search repo. Its use is described on the main Documentation page.
process_vcs.py (deprecated)
process_vcs.py
and checks.py
are provided as part of VCSTools (vcstools module on Garrwarla). This (among many other things) is a wrapper for doing recombine on the GPU cluster ("gpuq") on Galaxy.
To recombine all of the data, use
process_vcs.py -m recombine -o <obs ID> -a
or, for only a subset of data, use
process_vcs.py -m recombine -o <obs ID> -b <starting GPS second> -e <end GPS second>
If you want to see the progress, then use:
squeue -p gpuq -u $USER
Generally, this processing should not take too long, typically ~few hours.
Checking the recombined data
It is a good idea to check at this stage to make sure that all of the data were recombined properly. To do this, use:
checks.py -m recombine -o <obs ID>
This will check that there are all the recombined files are present and of the correct size. If there are missing raw files the recombining process will make zero-padded files and leave gaps in your data. If you would like to do a more robust check, beamform and splice the data (using the following steps) and then run:
prepdata -o recombine_test -nobary -dm 0 <fits files>
Then you can look through the produced .dat file for gaps using:
exploredat <.dat file>
Once you are happy that the data have been recombined correctly then you should delete the raw voltages (as they are no longer used in the pipeline and are a massive drain on storage resources).
Planned future developments for recombine
- Add GPU support
- Improve CLI interface
Description of PFB format
TO DO...