...
The MWAX correlator replaces the previous fine PFB, Voltage Capture System (VCS) (and media converter), correlator, and on-site archive of the Murchison Widefield Array (MWA). All of the fielded instrument hardware (tiles, beamformers, receivers) remains the same, as described in The Murchison Widefield Array: The SKA Low Frequency Precursor by Tingay et al. (2013), and the Phase II description paper: The Phase II Murchison Widefield Array: Design Overview by Wayth et al (2018). The diagram below shows a high level overview of the complete signal chain, including the main MWAX components: Media conversion and Correlator.
Top-Level Architecture
MWAX is located on site at the MRO and comprises 24 new servers (+ 2 spares) and repurposes 10 existing on site servers. Together the equipment occupies three racks. Output visibilities are transferred to Curtin, and ultimately Pawsey, via existing fibre-optic links.
...
In this section we describe the flow of signals from the MWA tiles and receivers to the correlator media converter (Medconv) servers, MWAX Correlator and then into long term storage Long Term Storage at the MWA Archive at the Pawsey Supercomputing Centre.
MWAX Media Conversion
Each of the MWA's existing 16 receivers in the field send 8 tiles worth of 24 coarse channels over 48 fibre optic cables using the Xilinx RocketIO protocol. The fibre optic cables terminate in the MRO Control Building, where two bundles of three fibres connect to a media conversion (medconv) server via custom Xilininx FPGA cards. Six independent processes on each medconv server convert the RocketIO data into Ethernet UDP packets which are sent out to our Cisco Nexus 9504 switch as multicast data where each coarse channel is assigned a multicast address. This provides the "corner-turn" where each of the 6 processes on each mediaconv server is sending one third of the coarse channels for one eighth sixteenth of the tiles in the array.
MWAX Correlator
This section describes the data flow within the MWAX correlator servers. The function and data flow between components is shown in the below diagram:
MWAX UDP Capture + Voltage Capture To Disk
As per standard IP multicast, any device on the voltage network can “join” the multicast for one or more coarse channels and a copy of the relevant stream will be passed to them.
...
A 256T sub-observation buffer for one coarse channel, for the 8 seconds is approximately 11 GB in size.
MWAX Correlator FX Engine
The MWAX correlator FX Engine is implemented as a PSRDADA client; a single process that reads/writes from/to the input/output ring buffers, while working in a closely coupled manner with a single GPU device - which can be any standard NVIDIA/CUDA-based GPU card.
The figure below shows the processing stages and data flows within the MWAX correlator FX Engine process.
The FX Engine treats individual 8 second sub-observations as independent work units. Most of its mode settings are able to change on-the-fly from one sub-observation to the next. Each 8 second sub-observation file contains 160 blocks of 50 ms of input data each. An additional block of metadata (of the same size as a 50 ms data block) is prepended to the data blocks, making a total of 161 blocks per sub-observation file. At the start of processing each new sub-observation file, the metadata block is parsed to configure the operating parameters for the following 160 data blocks.
...
The FFT’d data is then transposed to place it in the order that xGPU requires (slowest-to-fastest changing): [time][channel][tile][polarization]. As the data is re-ordered, it is written directly into xGPU’s input holding buffer in GPU memory. The data from five 50 ms blocks is aggregated in this buffer, corresponding to an xGPU “gulp size” of 250 ms. The minimum integration time is one gulp, i.e. 250 ms. The integration time can be any multiple of 250 ms for which there are an integer number of gulps over the full 8 second sub-observation.
Visibility Channelisation
xGPU places the computed visibilities for each baseline, with 200 Hz resolution (6,400 channels), in GPU memory. A GPU function then performs channel averaging according to the “fscrunch” factor specified in the PSRDADA header, reducing the number of output channels to (6400/fscrunch), each of width (200*fscrunch) Hz. For example, with fscrunch = 50, there will be 128 output visibility channels of 10 kHz each.
...
The averaged output channel data is transferred back to host memory where it is re-ordered before writing to the output ring buffer.
Visibility Re-ordering
xGPU utilizes a particuar data ordering that is optimized for execution speed. The visibility set is re-ordered by the CPU into a more intuitive triangular order: [time][baseline][channel][polarization]. See: MWAX Visibility File Format
The re-ordered visibility sets (one per integration time) are then written to the output ring buffer.
Visibility Data Capture
The data capture process, running on each MWAX Server, reads visibility data off the output PSRDADA ring buffer and writes the data into FITS format. The data capture process breaks up large visibility sets into files of up to approximately 10 GB each, in order to optimize data transfer speeds while keeping the individual visibility file sizes manageable. The FITS files are written onto a separate partition on the MWAX Server disk storage.
Transfer to Curtin Data Centre Temporary Storage
Each MWAX server has enough disk storage for around 30 TB of visibilities plus 30 TB of voltage data, effectively replacing the need for a separate "Online Archive" cluster of servers as the legacy MWA had. In normal operating modes and schedule, this means the MWA can continue to observe for a week or two even if the link to Perth is offline- data will continue to be stored on disk until the link is online again, and will then begin transmission to Perth.
...