Software progress/bug/solutions of the past, present and future
Dedispersion algorithms w/ EPFL
FDD version of dedisp
(Piyush Panchal)
… Here are the errors I found:
cufft calls (both real to complex and complex to real in FDDGPUPlan.cpp) were silently failing as there was no error checking in the code. I would strongly advise you to check the return values for cufftExecR2C and cufftExecC2R functions.
Wrong spin frequencies are computed when nsamp_fft != nsamp (FDDGPUPlan::execute_gpu()). This could be fixed by setting use_zero_padding to false in the same function which is set to true by default.
Phase values multiplied with the fourier coefficients in fdd_kernel.cuh are are incorrectly computed. To shift a discrete time signal a_j (j = 0 .. N-1) by complex multiplication of some phasors to its fourier coefficients A_j (j = 0 .. N-1) and doing an inverse FFT, the complex phasor multiplied to A_j has to be of the form exp(-i * 2 * pi * j/N * delta) where delta is an integer denoting the shift.
Testing repo:
(Piyush Panchal)
I was thinking about the fourier domain dedispersion and there seems to be a fundamental problem in this approach which has not been addressed.When shifting a time series (a)_j by multiplying its fourier coefficients (A)_j with exp(-i 2pi j/N delta), the time series is shifted periodically. That is, if the shift is to the right, some elements at the end of the time series will be moved to the beginning. So, for a particular DM value, the fourier coefficients obtained by complex multiplication and summing across channels correspond to a time signal for which delays for frequency channels have been applied in a periodic sense. This means that at the ends of the time signal, we have some garbage values that we would ideally like to discard, as is done in the time domain version. However, discarding end values of a certain discrete time signal in the fourier domain is a non trivial task. Actually I can't think of a way to do it without doing an inverse transform which is something we want to avoid, otherwise the advantage of FDD in terms of the number of operations is gone. An alternative is to just use the fourier coefficients of this contaminated signal but it is important to be aware of this contamination and maybe even quantify its effect. What are your thoughts?
(Christopher Lee)
I see what you mean, this may be something to consider if you want to take advantage of staying in the Fourier domain. It’s interesting that the authors didn’t mention this in the FDD paper.FDD does have the computational benefit of GPU acceleration though. So even including the inverse FFT, it’s still faster than time domain dedispersion. This is why I’ve been using it in my search pipeline even though dedisp performs the inverse FFT by default. However, my use case isn’t as data intensive as the SMART blind search, since I’m searching far less beams. So the added inverse FFT isn’t as much of a problem for me.
(Piyush Panchal)
Yes it is an important consideration and it seems to be overlooked in the FDD paper! Unfortunately there is no straightforward solution to this. I am thinking of conducting some numerical experiments where I observe the effect of a contaminated time series in the fourier domain for different contamination levels. A low DM means lower delays and lower contamination and vice versa.
GPU-agnostic compilation of dedisp
(Piyush Panchal, 2026-01-24)
Sorry for the delay in pushing the downsampling changes. I was trying to clean up the separate HIP/CUDA branches which made it difficult to push new changes on setonix.
With the help of the macros file @Bradley Meyers provided, I have a unified code which can be compiled for both hip and cuda. This will make pushing new features on setonix smoother.
Before the unified code, I worked on a multi gpu version of dedisp. I will also try to run it on setonix. If things go well, the MPI version is not far away.
Adding downsampling feature to dedisp
(Piyush Panchal, 2026-01-24)
At the moment, I have added downsampling and also confirmed that it is working in some preliminary comparisons with presto downsampled results. It can be used via the -downsamp flag, just like presto.
(Christopher Lee, 2026-01-28)
Looks like the downsampling is working as intended. This pulsar's period is 4.86 ms and I made folded plots for native resolution (0.1 ms) and downsampled by 8 (0.8 ms).
Adding barycentering feature to dedisp
(Piyush Panchal, 2026-02-18)
I just pushed some changes implementing barycentering. Like presto, it does barycentering by default unless the -nobary flag is added. It is on both setonix and DUG now at the usual location. For it to work, you need the environment variable TEMPO defined and the tempo executable location in PATH.
The implementation is a bit naive and uses a lot of memory, so you might be a bit limited in testing. I will try to improve the memory usage. At the moment I am just interested in seeing if the program logic is fine. I will also try to think of some tests. I am using tempo to calculate the insertion/deletion positions like presto, but for additions, I use average over a fixed window size (10,000).
(Piyush Panchal, 2026-02-20)
Hey @here. I was wondering if barycentering data before de dispersing is the right approach?
I think one order is the correct one. I am trying to write it in one page for you to see.
(Piyush Panchal, 2026-02-21)
I have pushed and compiled the latest dedisp code on both Setonix and DUG. It implements barycentering after downsampling. I have tested it a bit on my end (comparison with presto) and it seems to work with multiple fits files and with downsampling. The tests I did were with fits such that barycentering required bin deletion only. In the current implementation, bin addition is a simple duplication. I will try to modify the presto bin addition to a duplication and test if dedisp is doing the right thing.
(Bradley Meyers, 2026-02-22)
Yes, I believe it needs to be dedisp -> barycentre. But, you only need to compute the barycentre mask once - i.e., the same sample indices are added/removed.
(Christopher Lee, 2026-02-24)
These are the results with barycentering and with/without downsampling. The profile looks correct. I’m now running prepsubband to compare the barycentred results.
prepsubband results
(Piyush Panchal, 2026-02-24)
The end bit might probably be because prepsubband somehow chooses a "good output" length. I think the dedisp output length is slightly larger than that.
(Christopher Lee, 2026-02-24)
This is J0737-3039A. I’m assuming the offset in P0 is from the binary motion of the pulsar, but I will try with -nobary as well to see. Or, it could be because prepfold think’s it’s in the topocentric frame but it isn’t.
I think the issue is that the MJD epoch also needs to be updated [to the barycentre frame] in the inf file. It’s not as easy as changing barycentred from 0 to 1.
(Piyush Panchal, 2026-02-24)
I see. I will fix that as well. Thanks for pointing it out. Meanwhile, can you copy the right topo time to the dedisp inf to check?
(Christopher Lee, 2026-02-24)
Ok dedisp matches prepsubband when I fix the epoch
(Christopher Lee, 2026-02-24)
Also, the minus sign before the declination is missing in the dedisp inf file (and the plot)
J2000 Declination (dd:mm:ss.ssss) = 30:39:40.71
should be
J2000 Declination (dd:mm:ss.ssss) = -30:39:40.7100
(Bradley Meyers, 2026-02-25)
Ah yeah you will also need to correct the epoch to the equivalent barycentric arrival time in addition to messing around with the data values themselves. Forgot about that bit
(Piyush Panchal, 2026-03-16)
I have fixed the following things in the inf files:
TOA values for barycentered data at various DM values
Declination sign
Barycentered=1 for barycentered data
The changes are now available on both Setonix and DUG
(Christopher Lee, 2026-03-31)
I re-ran the code with the new fixes to the inf files. I can see in the inf files that the dedisp epochs match prepsubband now. Here's a comparison of the results. The only noticeable difference is the number of samples, which isn't an issue.
Unrelated to this, but I saw a 20-sigma improvement for J0737 with the newly beamformed data compared to the old data, probably due to the calibration.
VDIF smearing bug (solved)
See Issue 45 and PR 69 for GitHub related content. Sam McSweeney ultimately found the error and corrected it. We should turn these notes essentially into a unit test that can be run on both Legacy and MWAX VCS data in future now that we know the answers. The test data and associated scripts will live in Acacia in the mwavcs:tests bucket.
(Sammy’s notes on diagnosing and correcting the problem, with demonstrations)