$\texttt{cunuSHT}$: GPU Accelerated Spherical Harmonic Transforms on Arbitrary Pixelizations

Sebastian Belkner,Adriaan J. Duivenvoorden,Julien Carron,Nathanael Schaeffer,Martin Reinecke
2024-06-21
Abstract:We present $\texttt{cunusht}$, a general-purpose Python package that wraps a highly efficient CUDA implementation of the nonuniform spin-$0$ spherical harmonic transform. The method is applicable to arbitrary pixelization schemes, including schemes constructed from equally-spaced iso-latitude rings as well as completely nonuniform ones. The algorithm has an asymptotic scaling of $\mathrm{O}{(\ell_{\rm max}^3)}$ for maximum multipole $\ell_{\rm max}$ and achieves machine precision accuracy. While $\texttt{cunusht}$ is developed for applications in cosmology in mind, it is applicable to various other interpolation problems on the sphere. We outperform the fastest available CPU algorithm by a factor of up to 5 for problems with a nonuniform pixelization and $\ell_{\rm max}>4\cdot10^3$ when comparing a single modern GPU to a modern 32-core CPU. This performance is achieved by utilizing the double Fourier sphere method in combination with the nonuniform fast Fourier transform and by avoiding transfers between the host and device. For scenarios without GPU availability, $\texttt{cunusht}$ wraps existing CPU libraries. $\texttt{cunusht}$ is publicly available and includes tests, documentation, and demonstrations.
Instrumentation and Methods for Astrophysics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the computational efficiency and accuracy of the spherical harmonic transform (SHT) on arbitrary pixelation schemes, especially when dealing with non - uniformly sampled data. Specifically, the author has developed a Python package named cunuSHT, which utilizes CUDA acceleration to implement the non - uniform spherical harmonic transform (nuSHT) and can run efficiently on the GPU. ### Main Problems and Background 1. **High Computational Complexity**: The computational complexity of the traditional spherical harmonic transform (SHT) is \(O(\ell^4_{\text{max}})\) when dealing with large - scale data, which makes it difficult to achieve efficient computation for large - scale problems. 2. **Processing of Non - Uniformly Sampled Data**: In many practical applications, the distribution of data points is irregular (non - uniformly sampled), which further increases the computational difficulty. 3. **Limitations of Existing Methods**: The existing CPU algorithms have limited performance when dealing with non - uniformly sampled data, especially at high resolutions (for example, \(\ell_{\text{max}}>4\times10^3\)). ### Solutions cunuSHT solves the above problems in the following ways: - **GPU Acceleration**: By taking advantage of the powerful parallel computing capabilities of the GPU, the computational speed is significantly improved. Compared with a single modern 32 - core CPU, cunuSHT can achieve a 5 - fold speed increase when dealing with non - uniformly sampled cases. - **Double Fourier Sphere Method (DFS)**: By combining the double Fourier sphere method and the non - uniform fast Fourier transform (nuFFT), the interpolation problem on the sphere is transformed into a non - uniform fast Fourier transform problem on the torus, thereby reducing the computational complexity to \(O(\ell^3_{\text{max}})\). - **Avoiding Data Transmission between the Host and the Device**: All intermediate results are stored in the GPU memory, reducing the time cost of data transmission, so that cunuSHT can be seamlessly integrated into a larger GPU computing framework. ### Application Areas Although cunuSHT was initially developed for applications in cosmology, its high efficiency and versatility make it suitable for various tasks that require signal processing on the sphere, such as: - Cosmic microwave background (CMB) weak gravitational lensing effect - Radar tracking - Meteorology - Solar physics - Solving partial differential equations on the sphere ### Summary By introducing GPU acceleration, optimizing the algorithm structure, and avoiding unnecessary data transmission, cunuSHT has successfully solved the computational efficiency and accuracy problems of the spherical harmonic transform on non - uniformly sampled data, providing strong support for research in related fields.