MWA ASVO Use with HPC Systems
- 1 Introduction
- 2 The Submit, Wait, Download Anti-Pattern
- 3 The Best Practice Pattern: Submit->Check->Download->Process
- 3.1 Job 1: Submit
- 3.1.1 giant-squid example
- 3.1.2 manta-ray-client example
- 3.2 Job 2: Check
- 3.2.1 giant-squid example
- 3.2.2 manta-ray-client example
- 3.3 Job 3: Download
- 3.3.1 giant-squid example
- 3.3.2 manta-ray-client example
- 3.4 Job 4: Process
- 3.1 Job 1: Submit
Introduction
A common use case for MWA ASVO command line clients (manta-ray-client or giant-squid) is as part of a workflow running on a HPC system (e.g. Setonix or Garrawarla at Pawsey). Since HPC resources are almost always a scarce and in-demand resource we need to ensure that the way that we use MWA ASVO is not going to waste those precious resources.
The Submit, Wait, Download Anti-Pattern
The below is a common workflow for MWA ASVO use:
Step | Description | Duration |
---|---|---|
1 | Submit MWA ASVO job | ~1 second |
2 | Wait for MWA ASVO to complete the job. | >4 minutes to >7 days depending on the speed of Pawsey and the size of the MWA ASVO queue |
3 | Download the MWA data | <1 hour depending on the size of the data |
4 | Process the MWA data in the rest of your workflow | Depends on your workflow |
A common anti-pattern we see is that users will create a single HPC job which implements steps 1, 2, 3 and 4. Step 2 in red (where the client is waiting for the MWA ASVO job to complete) is often set to the largest possible WALLTIME (in Pawsey’s case it is 24 hours). During that time, the HPC node which is running that job is not doing anything, except polling MWA ASVO, and worse, the resources that node is occupying are not available for other users for all of that time. This can result in a large fraction of the HPC system being tied up waiting for MWA ASVO and unavailable for other users.
The Best Practice Pattern: Submit->Check->Download->Process
It is therefore highly recommended that users split their workflows up into parts that each do a specific job and therefore can completely eliminate any wastage of HPC resources.
Job 1: Submit
Create a job which only submits new jobs to MWA ASVO.
This job can be run from anywhere- e.g. your laptop or a HPC login node, as it is fast and does not require HPC resources.
Step | Description | Duration |
---|---|---|
1 | Submit MWA ASVO job | ~1 second |
giant-squid example
$ giant-squid submit-vis OBSID
manta-ray-client example
$ mwa_client -s joblist.csv
Job 2: Check
Create a job which then periodically checks if the job is ready to download.
Polling more than once per several minutes is unnecessary and not recommended.
This job can be run from anywhere- e.g. your laptop or a HPC login node, as it is fast and does not require HPC resources.
Step | Description | Duration |
---|---|---|
2 | Check MWA ASVO job is ready for download (or has failed or been cancelled) | ~1 second |
giant-squid example
$ giant-squid list --json
manta-ray-client example
Job 3: Download
When Job 2 detects the MWA ASVO job is ready for download it can then launch a HPC job (please use a data-mover or copyq node rather than a compute node) to download the data. If you opted to deliver the data to /scratch or /astro then you can skip this step.
Step | Description | Duration |
---|---|---|
3 | Download data | <1 hour depending on the size of the data |
giant-squid example
manta-ray-client example
Job 4: Process
With the data downloaded you can then launch your processing job on a compute node in the HPC cluster.
Step | Description | Duration |
---|---|---|
4 | Process the MWA data in the rest of your workflow | Depends on your workflow |