MWAX Correlator FX Engine details

MWAX Correlator FX Engine details

This page shows some example array shapes and sizes used whie processing a typical subobservation.

We have 160 blocks per eight second subobservation, so each block covers 50ms of time.

Sample rate is 1.28MHz or 1.6384MHz (32/25 * critical sample rate)

FFT

On entry to mwax_process_data_block we have {64000 or 81920} x NINPUTS samples

In critically sampled mode, fft_length is equal to the number of ultrafine channels (defaults 6400, overridden by commandline to be 6400). If we’re supersampling, that value is scaled by 32/25 (so, nominally 8192)

so, we have 10xNINPUTS ffts to do. For 16T, that’s 320

Figure 1: oversampled, 10 inputs (5 tiles)

Next we discard 1792 frequencies - currently symmetrically, but soon we’ll shift the window left a bit so that fine channel 0 can average frequencies centered on the leftmost ultrafine channel.

Figure 2: post fft - extra frequencies discarded

Each block now contains retained_samps_per_block_per_ant * num_input_signal_paths values

Corrections

if (ctx->apply_path_delays) { // and if pertinent then also de-ripple at the same time, depending on which LUT is selected in mwax_db2correlate2db_open. mwax_lookup_all_delay_gains( &ctx->d_path_delays[input_block_index], ctx->delay_lut, ctx->d_delay_gains.ptr, ctx->num_input_signal_paths, ctx->num_ultrafine_channels, ctx->num_ffts_per_block, ctx->stream1); mwax_fast_complex_multiply(ctx->d_delay_gains.ptr, ctx->d_chan.ptr, ctx->num_retained_samples_per_block, ctx->stream1); } else if (ctx->apply_deripple) // no delays, just de-ripple { // form the delay gain array for the current block - same LUT as for fractional delay correction but with delays all zero mwax_lookup_deripple_gains(ctx->delay_lut, ctx->d_delay_gains.ptr, ctx->num_input_signal_paths, ctx->fft_length, ctx->num_ffts_per_block, ctx->stream1); mwax_fast_complex_multiply(ctx->d_delay_gains.ptr, ctx->d_chan.ptr, ctx->num_input_samples_per_block, ctx->stream1); } if (ctx->apply_path_phase_offsets) { // now the phase corrections // form the phase gain array for the current block mwax_assemble_all_phase_offsets(ctx->path_phase_offsets_this_block, ctx->d_phase_offsets.ptr, ctx->num_input_signal_paths, ctx->num_ultrafine_channels, ctx->num_ffts_per_block, ctx->stream1); ctx->path_phase_offsets_this_block += ctx->num_input_signal_paths * ctx->num_ffts_per_block; // we're using separate phase gain values for every signal path for every FFT mwax_fast_complex_multiply((float *)ctx->d_phase_offsets.ptr, (float *)ctx->d_chan.ptr, ctx->num_retained_samples_per_block, ctx->stream1); }

next, if we’re applying delays, we grab the delays from int16_t d_path_delays[160][num_ffts][num_paths]

Those delays are in fractions of a sample, ranging from -1000 to 1000 millisamples.

delay_lut[delay][frequency] contains the phase change to apply. Note that if we’re also applying a deripple, then those values are also baked into the lut

Note these sum with the phase change from the phase gains below.

Then, the phase_offsets float complex path_phase_offsets[160][num_ffts][num_paths]

Phase offsets contain a phase per fft per path, and are applied equally to every frequency. They’re only accurate for the center frequency, and are calculated using the coarse_channel_number (as read from the COARSE_CHANNEL field in the PSRDADA header)

// coarse_channel_number is the center frequency of the channel divided by 1.128MHz

Correlation

The required order for xGPU is the transpose of what we’ve been using so far, so we call transpose_to_xGPU_kernel to copy the block into the xGPU input buffer. (note that the destination buffer is padded to the input path count xGPU was compiled for. The transpose function skips over the padding, which in turn was cleared when xGPU allocated it.)

We do num_input_blocks_per_xgpu_gulp blocks per gulp (set this to 5 on commandline with -a, default is 5), so we do five of these copies before we call xGPU

Figure 3: Input transposed and padded to pass to xGPU
apply_path_weights = header.APPLY_PATH_WEIGHTS; // currently always set to 0 by udp2sub apply_path_delays = header.APPLY_PATH_DELAYS; // set to subm->CABLEDEL || subm->GEODEL by udp2sub apply_path_phase_offsets = header.APPLY_PATH_PHASE_OFFSETS; // set to subm->CABLEDEL || subm->GEODEL by udp2sub apply_deripple = header.APPLY_COARSE_DERIPPLE;

Then if either path_delays or path_phase_offsets are true (and it will be both or neither), then we initialise both h_path_delays (copied to d_path_delays) and also h_path_phase_offset_gains (copied to d_path_phase_offsets)

APPLY_ flags

p_delays

p_phase_offsets

deripple

 

p_delays

p_phase_offsets

deripple

 

False

False

False

 

True

True

False

 

False

False

True

 

True

True

True

 

oversampled

deripple

delay_lut

oversampled

deripple

delay_lut

False

False

d_delay_lut_1

True

False

d_delay_lut_1

False

True

d_delay_lut_2

True

True

d_delay_lut_3