Abstract:It is shown micromagnetic and atomistic spin dynamics simulations can use multiple GPUs in order to reduce computation time, but also to allow for a larger simulation size than is possible on a single GPU. Whilst interactions which depend on neighbouring spins, such as exchange interactions, may be implemented efficiently by transferring data between GPUs using halo regions, or alternatively using direct memory accesses, implementing the long-range demagnetizing interaction is the main difficulty in achieving good performance scaling, where the data transfer rate between GPUs is a significant bottleneck. A multi-GPU convolution algorithm is developed here, which relies on single-GPU FFTs executed in parallel. It is shown that even for micromagnetic simulations where the demagnetizing interaction computation time dominates, good performance scaling may be achieved, with speedup factors up to 1.8, 2.5, and 3.1, for 2, 3, and 4 GPUs respectively. The code developed here can be used for any number of GPUs in parallel, with performance scaling strongly dependent on inter-GPU data transfer rate and connection topology. This is further improved in micromagnetic simulations which include a spin transport solver, obtaining speedup factors up to 1.96, 2.8, and 3.7, for 2, 3, and 4 GPUs respectively. The best case scenario is obtained for atomistic spin dynamics simulations, where the demagnetizing interaction is implemented with spin-averaged cells. Using a single workstation with 4 GPUs, it is shown atomistic spin dynamics simulations with up to 1 billion spins, and atomistic Monte Carlo simulations with up to 2 billion spins are possible, with a near-ideal performance scaling.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
The paper primarily addresses the issue of how to utilize multiple GPUs to accelerate computations in magnetic and atomic-scale spin dynamics simulations. Specifically, the authors propose a multi-GPU convolution algorithm to handle long-range demagnetization interactions, thereby significantly improving computational efficiency.
#### Main Issues:
1. **Efficient Implementation of Long-Range Demagnetization Interactions**: Long-range demagnetization interactions are a bottleneck in traditional single-GPU computations, especially in large-scale simulations. This paper proposes a multi-GPU convolution algorithm to solve this problem.
2. **Increasing Computational Scale**: By using multiple GPUs, larger-scale simulation problems can be handled, overcoming the limitations of a single GPU.
3. **Data Transfer Bottleneck**: The data transfer rate between multiple GPUs is a key factor affecting performance. The paper proposes a mixed-precision method to reduce the amount of data transfer, thereby improving overall computational efficiency.
#### Specific Methods:
- **Multi-GPU Convolution Algorithm**: Utilizes fast Fourier transform (FFT) on a single GPU to perform parallel execution, achieving efficient long-range demagnetization interaction calculations.
- **Data Transfer Optimization**: By using mixed precision (reducing data transfer precision), the amount of data transfer is reduced, improving computational efficiency.
- **Performance Testing under Different Connection Topologies**: Compares the performance differences under point-to-point connections (such as NVSwitch) and bus connections.
#### Experimental Results:
- With 4 GPUs, atomic-scale spin dynamics simulations can achieve nearly ideal performance scaling, simulating up to 1 billion spins.
- In micromagnetic simulations including demagnetization interactions, the speedup factors are 1.8, 2.5, and 3.1 (for 2, 3, and 4 GPUs, respectively).
- For micromagnetic simulations including spin transport solvers, the speedup factors further increase to 1.96, 2.8, and 3.7 (for 2, 3, and 4 GPUs, respectively).
### Summary
The paper proposes a new multi-GPU convolution algorithm that effectively addresses the computational bottleneck of long-range demagnetization interactions. By optimizing data transfer methods, it significantly enhances the computational efficiency of large-scale micromagnetic and atomic-scale spin dynamics simulations.