VCS pulsar data processing
...
and this will help you determine the correct GPS times to pass to the -b
/-e
options. This applies for all jobs that have an optional beginning and end times.
...
Recombine Anchor
...
recombine recombine
recombine | |
recombine |
As noted above in Downloading Data, the vcs_download.nf
script includes the recombine step, so manually recombining will only be necessary for situations where the Nextflow option is either unavailable or undesirable. (Note that with the recent switch to using ASVO as the data downloading server, vcs_download.nf
is now a bit of a misnomer – it can still be used to do the recombining, even if the data are already downloaded.) Other methods of recombining are detailed at the Recombine page.
Incoherent sums
Anchor | ||||
---|---|---|---|---|
|
After the download has completed, you already have all that is necessary to create the first kind of beamformed data: an incoherent sum. The data used to create this kind of beamformed output are labelled as <obsID>_<GPS second>_ics.dat
in the downloaded data directory (/group/mwavcs/vcs/<obs ID>/combined
by default), hence we refer to the incoherent sum files as ICS files. The incoherent sum is, basically, the sum of the tile powers, producing a full tile's field of view, with a √N improvement in sensitivity over a single tile, where N is the number of tiles combined. Unfortunately, this kind of data is also very susceptible to RFI corruption, thus you will need to be quite stringent with your RFI mitigation techniques (in time and frequency).
...
dedicated calibrator: usually directly before or after the target observation on a bright calibrator source. These are stored on the MWA data archive as visibilities (they are run through the online correlator like normal MWA observations). You can find the observation ID for the calibrator by searching on the MWA data archive or by using the following command which will list compatible calibrator IDs sorted by how close they are in time to the observation:
Code Block language text mwa_metadb_utils.py -c <obs ID>
If this command generates an error, it may be due to the lack of calibration observations with the same frequency channels. If so try the Calibration Combine Method.
- in-beam calibrator: using data from the target observation itself, correlating a section offline, and using those visibilities. See Offline correlation.
In order to download the calibration observation, set your MW ASVO API key as an environment variable. Below are some steps to do so:
...
- If there are any tiles with max>3 on multiple channels it is worth flagging.
- If there are any tiles that have a max=0 this is worth flagging (even if only one polarisation has max=0!) as it is contributing no signal and can cause errors in beamforming
- It is best to only flag up to ~3 tiles at a time as bad tiles can affect other potentially good tiles
- Sensitivity scales as ~sqrt(128-N) where N is the number of tiles flagged so if you start flagging more than 50 tiles will start to lower your sensitivity and maybe worth abandoning
- Make sure you put the number in the right of the key (between flag and ?) into the
flagged_tiles.txt
file
In the "attempt_number_N" subdirectory are a chan_x_output.txt and phase_x_output.txt file that contains all of the recommended flags that the calibration plotting script creates. These can be useful when deciding which tile(s) to flag next. The following bash command will output the worst tile for each channel:
Code Block | ||
---|---|---|
| ||
for i in $(ls chan*txt); do grep $(cat $i | cut -d '=' -f 3 | cut -d ' ' -f 1 | sort -n | tail -n 1 | head -n 1) $i; done |
...
Image accurate as of commit e6215f42c1d7c0b5a255721bc46840335170e579 to mwa_search repo
- Input Data: The OPP requires the calibrated and beamformed products of the VCS. These data can be acquired using the method described here.
- Pulsar Search: Given an observation ID, each pulsar within the field is identified and handed to the Pulsar Processing Pipeline (PPP)
- Initial Fold(s): Performs a PRESTO fold on the data. For slow pulsars, this will probably be 100 bins. Fast pulsars will be 50 bins.
- Classification: The resulting folds are classified as either a detection, or non-detection.
- Best Pointing: For the MWA's extended array configuration, there may be multiple pointings for a single pulsar. Should this be the case, we want to find the brightest detection to use for the rest of the pipeline. The "best" detection will be decided on and its pointing will be the only one used going forward.
- Post Folds: A series of high-bin folds will be done. This is in order to find the highest time resolution fold we can do while still getting a detection.
- Upload products to database: Uploads the initial fold and best fold to the pulsar database.
- IQUV Folding: Uses DSPSR to fold on stokes IQUV, making a timescrunched archive. This archive is immediately converted back to PSRFITS format for use with PSRSALSA
- RM Synthesis: Runs RM synthesis on the archive. If successful, will apply this RM correction.
- RVM Fitting: Attempts to fit the Rotating Vector Model to the profile. If successful, will upload products to the database.
...
Code Block | ||
---|---|---|
| ||
nswainston@garrawarla-1:~> ssh garrawarla-2 |
Resuming Nextflow Pipelines
One large benefit of Nextflow pipelines is that you can resume the pipelines. Once you have fixed the bug that caused the pipeline to crash simply relaunch the pipeline with the -resume
option added. For the resume option to work you must run the command from the same directory and the working directory can't be deleted
Cleaning up the work directories
Once the pipeline is done and you are confident you don't need to resume the pipeline or need the intermediate files then it is a good idea to remove the Nextflow work directories to save space. By default, the work directories are stored in /astro/mwavcs/$USER/<obsid>_work
Calibration Combining Method AnchorCalCombine CalCombine
CalCombine | |
CalCombine |
...
The name formatting for calibrator observations is the name of the calibrator source, an underscore and the centre frequency channel ID. Try and find a pair of calibration observations with the same calibrator source and, together, will cover the entire frequency range of the target observation. For the above example, this was 1195317056 and 1195316936. If you can't find any suitable calibration observations, then you can keep increasing the time search window up to 48 hours.
Now that you know which calibration observations you need, download and calibrate them as you normally would as explained in the calibration section. It is best to use the same values in the flagged_tiles.txt and flagged_channels.txt for all calibration obs to ensure your calibration solutions are consistent. Once the calibration is complete you can combine the two calibrations into one using the script
...
This will output the combined calibration solution to /astro/mwavcs/vcs/[obs ID]/cal/<first calibration ID> _<second calibration ID>/rts
and you can treat the calibrator ID as <first calibration ID> _<second calibration ID>
when being used in other scripts.
Deprecated Methods
These are old methods that are not maintained but may be useful if you need to do something specific or the new scripts have failed
Download (old python method)
...
The total data volume downloaded will vary, but for maximum duration VCS observations this can easily be ~40 TB of just raw data. It is therefore important to keep in mind the amount of data you are processing and the disk space your are consuming. If only the raw voltages have been downloaded then you will need to recombine the data yourself, which doubles the amount of data (see next section).
...
Note that this step should be performed automatically by the vcs_download.nf
script.
Recombine takes data spread over 32 files per second (each file contains 4 fine channels from one quarter of the array) and recombines them to 24+1 files per second (24 files with 128 fine channels from the entire array and one incoherent sum file); this is done on the GPU cluster ("gpuq") on Galaxy. When downloading the data, if you retrieved the "Processed" (i.e. recombined) data, then ignore this step as it has already been done on the NGAS server.
To recombine all of the data, use
Code Block | ||
---|---|---|
| ||
process_vcs.py -m recombine -o <obs ID> -a |
or, for only a subset of data, use
Code Block | ||
---|---|---|
| ||
process_vcs.py -m recombine -o <obs ID> -b <starting GPS second> -e <end GPS second> |
If you want to see the progress, then use:
Code Block | ||
---|---|---|
| ||
squeue -p gpuq -u $USER |
Generally, this processing should not take too long, typically ~few hours.
Checking the recombined data
As before, it is a good idea to check at this stage to make sure that all of the data were recombined properly. To do this, use:
Code Block | ||
---|---|---|
| ||
checks.py -m recombine -o <obs ID> |
This will check that there are all the recombined files are present and of the correct size. If there are missing raw files the recombining process will make zero-padded files and leave gaps in your data. If you would like to do a more robust check, beamform and splice the data (using the following steps) and then run:
Code Block | ||
---|---|---|
| ||
prepdata -o recombine_test -nobary -dm 0 <fits files> |
Then you can look through the produced .dat file for gaps using:
Code Block | ||
---|---|---|
| ||
exploredat <.dat file> |
Once you are happy that the data have been recombined correctly then you should delete the raw voltages (as they are no longer used in the pipeline and are a massive drain on storage resources).
Beamforming (old python method)
...