Calibrate and WSClean Performance on Garrawarla NVMe Drives
The tables below summarize timing tests done using two compute nodes available on Garrawarla (a Hewlett-Packard GPU cluster compute system at the Pawsey Supercomputing Centre) to process and image different MWA observations. Currently the 960GB NVMe drive's that are available per node on Garrawarla are not being utilized to their maximum capacity, but they provide much faster read and write file I/O times. These timing tests were taken to see if there is a significant speed-up for common imaging commands used in processing MWA observations.
MWA observations used in timing tests:
- EoR MWA Observation - 2 minute observation with 2s timing resolution in the EoR-0 field, ~4GB measurement set
- IPS MWA Observation - 10 minute observation with 10s timing resolution, ~40GB measurement set
- GLEAM-X MWA Observation ~15GB measurement set
In general, there isn't a huge difference in the timing when looking at smaller observations, but is definitely more noticeable when the observations start getting larger. Although there is only really a 15-20 minute speedup for the largest of the tested observations, when planning on processing more than a handful such as a couple hundred or even a couple thousands, those minutes start to add up, and thought should be given whether the NVMe drives should be utilized. Another aspect that will greatly improve the run time on the NVMe's is the amount of data (eg, images, output files) that have to be copied back to /astro (or any other storage system), the less transferring back that has to be done, the more useful the NVMe drives become.
Timing for Different MWA Observations
/astro * | NVMe with Data Transfer ** | NVMe without Data Transfer ** | |
EoR MWA Observation | 24m 14s | 23m 04s | 22m 49s |
IPS MWA Observation | 3h 55m 16s *** | 3h 46m 41s | 3h 38m 12s |
GLEAM-X MWA Observation | 17m 13s | 20m 33s | 13m 29s |
*All these timing tests are before the ASVO delivery system update, therefore measurement sets are still downloaded from Acacia
**Data Transfer: For the NVMe tests, the measurement set was downloaded straight to the temporary NVMe directory from Acacia, therefore it is not included in the data transfer timing, as the same process occurred on /astro (before the ASVO /astro delivery system was implemented). The data transfer that is included in the timing is the transfer of the final images produced from the program, and any other output files from the NVMe back to /astro, as the NVMe's are not to be used as long term storage systems.
***There has been a huge improvement in super-computing ability seen in the processing of IPS observations. Around 2020, running the same commands on the IPS observations took ~10 hours on Magnus (another compute system at Pawsey)
Timing Breakdown for EoR MWA Observation
/astro | NVMe | |
---|---|---|
Snapshot Calibration | 2m 31s | 2m 28s |
Standard Full Calibration | 1m 12s | 1m 09s |
Apply Solutions | 0m 09s | 0m 09s |
Standard Deep Image WSClean | 4m 23s | 3m 42s |
Snapshot Images WSClean | 14m 43s | 14m 01s |
Commands run:
Snapshot Calibration - calibration of each individual time-step in an observation
Snapshot Calibrationcalibrate -minuv 130 -maxuv 1300 -j 38 -t 1 -m {obsid}_model.txt {obsid}.ms {obsid}_multi.bin
Standard Calibration - calibration using the whole observation
Standard Calibrationcalibrate -minuv 130 -maxuv 1300 -j 38 -m {obsid}_model.txt {obsid}.ms {obsid}.bin
Apply Solutions - apply full standard calibration to measurement set
Apply Solutionsapplysolutions -nocopy {obsid}.ms {obsid}.bin
Standard Deep Image WSClean - create a single image using the whole observation
Standard Deep Image WSCleanwsclean -j 38 -name {obsid} -pol xx,yy -size 2400 2400 -join-polarizations -minuv-l 50 -weight briggs 1.0 -niter 12000 -auto-mask 3 -auto-threshold 2 -nmiter 5 -scale 1amin -log-time -mgain 0.8 {obsid}.ms
Snapshot Images WSClean - create an image for each time-step of the whole observation
Snapshot Images WSCleanwsclean -j 38 -name {obsid} -subtract-model -pol xx,yy -size 2400 2400 -join-polarizations -minuv-l 50 -weight briggs 1.0 -nwlayers 18 -niter 12000 -auto-mask 3 -auto-threshold 2 -scale 1amin -log-time -no-reorder -no-update-model-required -interval 2 53 -intervals-out 51 {obsid}.ms