Massively Parallel Fourier Transform: Application to the Unbounded Poisson Equation and Performance Analysis

  • Balty, Pierre (UCLouvain)
  • Chatelain, Philippe (UCLouvain)
  • Gillis, Thomas (MIT)

Please login to view abstract download link

Poisson equations appear in a wide range of physical problems, whose simulations often require lots of computational resources. In time-dependent applications, they are solved many times and their time-to-solution constitutes a significant part of the total simulation time. Increasing the Poisson solver’s efficiency is therefore crucial to reduce the computational cost of those simulations. In a high computing context, it is a particularly challenging problem, as their resolutions only rely on a few operations and memory accesses, while more than 50% of their time-to-solution is spent in communications. The FLUPS library, a Fourier-based library of unbounded Poisson solvers, tackles that issue with an implementation of the distributed Fourier transform tailor-made for successive resolution of the unbounded Poisson problem. It solves the Poisson equation using a convolution between the right-hand side f and a pre-computed Green function G. The convolution is performed as a point-wise multiplication in the spectral space. Its implementation therefore closely relates to the implementation of a parallel distributed 3D FFT transform, as it requires to compute the spectral representation of both the right-hand side and the Green function. The 3D FFT is computed as a succession of 1D FFTs in each direction. 1D FFTs are performed using the FFTW library while FLUPS relies on the message passing interface (MPI) to realign the data in memory between each 1D transform. However the proposed implementation only supports cell-centred data layout and features basic communication strategies. In this work, three extensions of the FLUPS library are presented: first, we generalise its implementation to support node-centred data layout; second, we introduce the ability to solve the Biot-Savart problem; and third we provide three distinct implementations to handle the inescapable communications: a commonly proposed all-to-all implementation, and two non-blocking versions relying respectively on manual packing and MPI DATATYPE to communicate over the network. We validated the proposed software against different analytical solutions. We then obtain times-to-solution in the periodic case that are similar to one of the fastest third-party distributed FFT Poisson solvers, accFFT. Finally, we analyse and detail the performance metric for each implementation, which has been performed using various top-tier European infrastructures with up to 50k cores.