...
MWAX has been designed such that the exact same code used for real-time operation on the dedicated MWAX Servers at the MRO can also be installed on compute nodes with lower specification GPUs (e.g. Pawsey) to provide an offline mode, operating below real-time speed. Note: The modes available to the offline correlator depend heavily on the server and GPU hardware it is executed on.
The table below shows (in green) which output visibility modes will be able to operate in of MWAX using 128 tiles that run real-time with on the proposed hardware configuration , assuming 256 tiles and 30.72 MHz of instantaneous bandwidth. The figures overlaid on each mode entry are the data rate of visibility data that will be generated in this mode in gigabits per second (Gbps). Modes in red may still be possible for short periods, subject to final hardware specifications and limitations. Note that the number of modes available to astronomers is significantly increased over the legacy correlator.Seeare available at this page: MWAX Correlator Modes (128T)
Signal Path/Data Flow
...
The FFT’d data is then transposed to place it in the order that xGPU requires (slowest-to-fastest changing): [time][channel][tile][polarization]. As the data is re-ordered, it is written directly into xGPU’s input holding buffer in GPU memory. The data from five 50 ms blocks is aggregated in this buffer, corresponding to an xGPU “gulp size” of 250 ms. The minimum integration time is one gulp, i.e. 250 ms. The integration time can be any multiple of 250 ms for which there are an integer number of gulps over the full 8 second sub-observation.
xGPU utilizes data ordering that is optimized for execution speed. The visibility set is re-ordered into a more intuitive triangular order by the CPU: [time][baseline][channel][pol]. The re-ordered visibility sets (one per integration time) are then written to the output ring buffer.
Visibility Channelisation
xGPU places the computed visibilities for each baseline, with 200 Hz resolution (6,400 channels), in GPU memory. A GPU function then performs channel averaging according to the “fscrunch” factor specified in the metadata block, reducing the number of output channels to (6400/fscrunch), each of width (200*fscrunch) Hz. During this averaging process, each visibility can have a multiplicative weight applied, based on a data occupancy metric that takes account of any input data blocks that were missing due to lost UDP packets or RFI excision (a potential future enhancement). The centre (DC) ultrafine channel is excluded when averaging and the centre output channel values are re-scaled accordingly. Note that only 200 Hz of bandwidth is lost in this process, rather than a complete output channel. The averaged output channel data is then transferred back to host memory.xGPU utilizes data ordering that is optimized for execution speed. The visibility set is re-ordered into a more intuitive triangular order by the CPU: [time][baseline][channel][pol]. The re-ordered visibility sets (one per integration time) are then written to the output ring buffer.
Visibility Data Capture
The data capture process, running on each MWAX Server, reads visibility data off the output PSRDADA ring buffer and writes the data into FITS format. The data capture process breaks up large visibility sets into files of up to approximately 5 GB each, in order to optimize data transfer speeds while keeping the individual visibility file sizes manageable. The FITS files are written onto a separate partition on the MWAX Server disk storage.
...