

GPU4PySCF is a GPU plugin for PySCF, focusing on performance and industrial applications.
GitHub Repo (https://github.com/pyscf/gpu4pyscf)
This notebook is created by Xiaojie Wu (wxj6000@gmail.com)
Prerequisite
Basic knowledge of molecular simulations and fundamentals of PySCF are required for this notebook.
We expect the following examples running the environment with
- CUDA 12.x
- Python3
- Tesla T4, 16 GB memory
- CuPy >= 13.0
- GPU4PySCF >= 1.0.0
- PySCF >= 2.6.0
Check GPU availability
Please make sure you have GPU connected, before executing the following tasks.
Wed Jun 5 02:22:51 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 34C P8 9W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
We have Tesla T4 ready and CUDA driver installed.
Install GPU4PySCF from PyPI
GPU4PySCF are released with two packages on PyPI: \
gpu4pyscf-cuda11x
for CUDA 11.x and \
gpu4pyscf-cuda12x
for CUDA 12.x.
Our current CUDA environment is CUDA 12.x. Now we install the latest version.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: gpu4pyscf-cuda12x in /opt/mamba/lib/python3.10/site-packages (1.0) Requirement already satisfied: pyscf~=2.6.0 in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (2.6.2) Requirement already satisfied: geometric in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (1.0.2) Requirement already satisfied: gpu4pyscf-libxc-cuda12x in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (0.4) Requirement already satisfied: cupy-cuda12x in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (13.2.0) Requirement already satisfied: pyscf-dispersion in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (1.1.0) Requirement already satisfied: scipy!=1.5.0,!=1.5.1 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (1.11.4) Requirement already satisfied: numpy!=1.16,!=1.17,>=1.13 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (1.26.2) Requirement already satisfied: h5py>=2.7 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (3.10.0) Requirement already satisfied: setuptools in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (65.5.0) Requirement already satisfied: fastrlock>=0.5 in /opt/mamba/lib/python3.10/site-packages (from cupy-cuda12x->gpu4pyscf-cuda12x) (0.8.2) Requirement already satisfied: networkx in /opt/mamba/lib/python3.10/site-packages (from geometric->gpu4pyscf-cuda12x) (3.3) Requirement already satisfied: six in /opt/mamba/lib/python3.10/site-packages (from geometric->gpu4pyscf-cuda12x) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting cutensor-cu12 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ed/d6/61fc3511bc9e4cdb423b69964e3d344090b4093cbf9d3c8cc469ef4642d0/cutensor_cu12-2.0.2-py3-none-manylinux2014_x86_64.whl (156.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.9/156.9 MB 2.2 MB/s eta 0:00:0000:0100:01 Installing collected packages: cutensor-cu12 Successfully installed cutensor-cu12-2.0.2 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Quick Start
Basics of PySCF
GPU4PySCF has the same syntax as PySCF. Let us take a simple example of water molecule to show some basics of PySCF. We calculate SCF, analytical gradient, and analytical Hessian with density fitting. We recommend the PySCF Documentation for the detailed APIs.
output file: pyscf.log /opt/mamba/lib/python3.10/site-packages/pyscf/dft/libxc.py:1110: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, ' /opt/mamba/lib/python3.10/site-packages/cupy/cuda/compiler.py:233: PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release... jitify._init_module()
to_gpu
&& to_cpu
to_gpu
convert a PySCF object into a GPU4PySCF object.
to_cpu
convert a GPU4PySCF object into PySCF object.
Note: not all PySCF classes have to_gpu
method implemented. Please checkout the github repo for supported functionalities.
converged SCF energy = -75.2427927513248
Now the GPU task is done. We are going back to CPU and apply the method available on PySCF.
**** MO energy **** MO #1 energy= -18.5309130361177 occ= 2 MO #2 energy= -0.867406722063318 occ= 2 MO #3 energy= -0.430552861990662 occ= 2 MO #4 energy= -0.287810535694673 occ= 2 MO #5 energy= -0.213845863536424 occ= 2 MO #6 energy= 0.0279360174058646 occ= 0 MO #7 energy= 0.103970833518282 occ= 0 MO #8 energy= 0.365852236362224 occ= 0 MO #9 energy= 0.401817990005324 occ= 0 MO #10 energy= 0.417092793857731 occ= 0 MO #11 energy= 0.427089684574495 occ= 0 MO #12 energy= 0.533162356854433 occ= 0 MO #13 energy= 0.559423477534688 occ= 0 MO #14 energy= 0.642324854221829 occ= 0 MO #15 energy= 0.671254075360752 occ= 0 MO #16 energy= 0.763571220143496 occ= 0 MO #17 energy= 0.997977782675208 occ= 0 MO #18 energy= 1.15599649400448 occ= 0 MO #19 energy= 1.21701454122243 occ= 0 MO #20 energy= 1.69490924488968 occ= 0 MO #21 energy= 1.73123719335389 occ= 0 MO #22 energy= 1.76304918092714 occ= 0 MO #23 energy= 1.79613298056702 occ= 0 MO #24 energy= 1.8517827395061 occ= 0 MO #25 energy= 2.06436606632415 occ= 0 MO #26 energy= 2.33338168329192 occ= 0 MO #27 energy= 2.48177986111836 occ= 0 MO #28 energy= 2.66086177703876 occ= 0 MO #29 energy= 3.02456037955242 occ= 0 MO #30 energy= 3.12254899147704 occ= 0 MO #31 energy= 3.23429663064129 occ= 0 MO #32 energy= 3.2351676434303 occ= 0 MO #33 energy= 3.259149997086 occ= 0 MO #34 energy= 3.34031726662165 occ= 0 MO #35 energy= 3.49577379650828 occ= 0 MO #36 energy= 3.55518865554412 occ= 0 MO #37 energy= 3.60974761183145 occ= 0 MO #38 energy= 3.6151833755771 occ= 0 MO #39 energy= 3.96421035205043 occ= 0 MO #40 energy= 4.05321846213409 occ= 0 MO #41 energy= 4.05517739148084 occ= 0 MO #42 energy= 4.12473981358761 occ= 0 MO #43 energy= 4.56056537756262 occ= 0 MO #44 energy= 4.58435732098983 occ= 0 MO #45 energy= 4.73384117166823 occ= 0 MO #46 energy= 4.91648601245221 occ= 0 MO #47 energy= 5.54141548172266 occ= 0 MO #48 energy= 6.08296384212195 occ= 0 MO #49 energy= 6.32455220739404 occ= 0 MO #50 energy= 6.33707844352242 occ= 0 MO #51 energy= 6.46267317107576 occ= 0 MO #52 energy= 6.49223865377348 occ= 0 MO #53 energy= 6.50236887302774 occ= 0 MO #54 energy= 6.71797444670698 occ= 0 MO #55 energy= 6.78370175375664 occ= 0 MO #56 energy= 7.30334068696683 occ= 0 MO #57 energy= 7.33175830779937 occ= 0 MO #58 energy= 7.5679760373653 occ= 0 MO #59 energy= 43.436205078353 occ= 0 ** Mulliken atomic charges ** charge of 0O = -0.67873 charge of 1H = 0.33936 charge of 2H = 0.33936 Dipole moment(X, Y, Z, Debye): 0.00000, 0.00000, -1.94381
((array([ 1.99980742e+00, 1.64073176e+00, 5.49375332e-03, 8.70030710e-04, 5.08039632e-05, 1.34854233e+00, 1.97792842e+00, 1.67232550e+00, 5.95250816e-03, 9.95958234e-03, 3.58036226e-03, 1.59530251e-07, 2.17586562e-06, 2.76104485e-04, 1.25119538e-20, 2.49017054e-03, 2.97840701e-03, 4.33219390e-03, 1.76577009e-03, -3.19599622e-22, 1.47241434e-04, 2.15374727e-04, 8.87569993e-05, 1.15468095e-04, 1.99714516e-04, 1.09271369e-21, 2.73351493e-05, 5.82359286e-05, 4.11414426e-06, 5.99425750e-04, 1.84269929e-04, 6.41352917e-01, 6.14327280e-03, 4.87674998e-03, 6.44484767e-04, 4.60035700e-03, 2.30830273e-03, 1.42970984e-04, 2.58059627e-07, 3.58278604e-04, 3.70260005e-06, 1.83610566e-05, 1.29115343e-05, 4.93534685e-05, 1.24381905e-04, 6.41352917e-01, 6.14327280e-03, 4.87674998e-03, 6.44484767e-04, 4.60035700e-03, 2.30830273e-03, 1.42970984e-04, 2.58059627e-07, 3.58278604e-04, 3.70260005e-06, 1.83610566e-05, 1.29115343e-05, 4.93534685e-05, 1.24381905e-04]), array([-0.67872739, 0.3393637 , 0.3393637 ])), array([ 1.14543145e-13, 4.21425648e-14, -1.94381215e+00]))
Performance of GPU4PySCF on T4
Now we use a realistic example (Vitamin C molecule) to show the speedup of GPU4PySCF over PySCF on CPU.
Here is the geometry of Vitamin C molecule.
Let us run SCF calculation and gradient calculation on CPU first. It will take about 6 mins. We take a break and record the wall clock time.
compute time for energy: 214.773 s
Now let us run the same tasks on GPU. It will take about half min.
It is already 10x faster, although Nvidia Tesla T4 has only 250 GFLOPS for FP64! The more powerful GPUs such as A100 and H100 have 19.5 and 67 TFLOPS for FP64, respectively. You can run GPU4PySCF on those powerful GPUs for better efficiency.
Some Useful Examples
We take water molecule as an example and import common modules.
Dispersion Correction & Nonlocal Correction
B3LYP functional with density fitting and D3(BJ) dispersion correction
ωB97m functional with density fitting and vv10 nonlocal correction
Geometry Optimization & Transition State Search
We use geomeTRIC for geometry optimization and transition state search. geomeTRIC is already installed as dependency.
Solvation Free Energy using SMD model
Open-Shell Calculations
Density Fitting MP2



