Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
GPU4PySCF QuickStart
量子化学
计算化学
电子结构
gpu
DFT
python
量子化学计算化学电子结构gpuDFTpython
Xiaojie Wu
更新于 2024-09-04
推荐镜像 :Basic Image:ubuntu:22.04-py3.10-cuda12.1
推荐机型 :c4_m15_1 * NVIDIA T4
赞 3
Prerequisite
Check GPU availability
Install GPU4PySCF from PyPI
Quick Start
Basics of PySCF
to_gpu && to_cpu
Performance of GPU4PySCF on T4
Some Useful Examples
Dispersion Correction & Nonlocal Correction
Geometry Optimization & Transition State Search
Solvation Free Energy using SMD model
Open-Shell Calculations
Density Fitting MP2

GPU4PySCF is a GPU plugin for PySCF, focusing on performance and industrial applications.

GitHub Repo (https://github.com/pyscf/gpu4pyscf)

This notebook is created by Xiaojie Wu (wxj6000@gmail.com)

代码
文本

Prerequisite

Basic knowledge of molecular simulations and fundamentals of PySCF are required for this notebook.

We expect the following examples running the environment with

  • CUDA 12.x
  • Python3
  • Tesla T4, 16 GB memory
  • CuPy >= 13.0
  • GPU4PySCF >= 1.0.0
  • PySCF >= 2.6.0
代码
文本

Check GPU availability

Please make sure you have GPU connected, before executing the following tasks.

代码
文本
[ ]
!nvidia-smi
Wed Jun  5 02:22:51 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
代码
文本

We have Tesla T4 ready and CUDA driver installed.

代码
文本

Install GPU4PySCF from PyPI

GPU4PySCF are released with two packages on PyPI: \ gpu4pyscf-cuda11x for CUDA 11.x and \ gpu4pyscf-cuda12x for CUDA 12.x.

Our current CUDA environment is CUDA 12.x. Now we install the latest version.

代码
文本
[2]

!pip3 install gpu4pyscf-cuda12x
!pip3 install cutensor-cu12
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: gpu4pyscf-cuda12x in /opt/mamba/lib/python3.10/site-packages (1.0)
Requirement already satisfied: pyscf~=2.6.0 in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (2.6.2)
Requirement already satisfied: geometric in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (1.0.2)
Requirement already satisfied: gpu4pyscf-libxc-cuda12x in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (0.4)
Requirement already satisfied: cupy-cuda12x in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (13.2.0)
Requirement already satisfied: pyscf-dispersion in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (1.1.0)
Requirement already satisfied: scipy!=1.5.0,!=1.5.1 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (1.11.4)
Requirement already satisfied: numpy!=1.16,!=1.17,>=1.13 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (1.26.2)
Requirement already satisfied: h5py>=2.7 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (3.10.0)
Requirement already satisfied: setuptools in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (65.5.0)
Requirement already satisfied: fastrlock>=0.5 in /opt/mamba/lib/python3.10/site-packages (from cupy-cuda12x->gpu4pyscf-cuda12x) (0.8.2)
Requirement already satisfied: networkx in /opt/mamba/lib/python3.10/site-packages (from geometric->gpu4pyscf-cuda12x) (3.3)
Requirement already satisfied: six in /opt/mamba/lib/python3.10/site-packages (from geometric->gpu4pyscf-cuda12x) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting cutensor-cu12
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ed/d6/61fc3511bc9e4cdb423b69964e3d344090b4093cbf9d3c8cc469ef4642d0/cutensor_cu12-2.0.2-py3-none-manylinux2014_x86_64.whl (156.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.9/156.9 MB 2.2 MB/s eta 0:00:0000:0100:01
Installing collected packages: cutensor-cu12
Successfully installed cutensor-cu12-2.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本

Quick Start

代码
文本

Basics of PySCF

GPU4PySCF has the same syntax as PySCF. Let us take a simple example of water molecule to show some basics of PySCF. We calculate SCF, analytical gradient, and analytical Hessian with density fitting. We recommend the PySCF Documentation for the detailed APIs.

代码
文本
[3]
import pyscf
from gpu4pyscf.dft import rks

atom ='''
O 0.0000000000 -0.0000000000 0.1174000000
H -0.7570000000 -0.0000000000 -0.4696000000
H 0.7570000000 0.0000000000 -0.4696000000
'''

mol = pyscf.M(atom=atom, # can a string, list, or xyz filename
charge=0, # assign total charge
spin=None, # if spin = None, spin = # of electrons %2
basis='def2-tzvpp', # basis set
verbose=1, # control print info
output='pyscf.log' # log file
)

mf = rks.RKS(mol,
xc='b3lyp' # xc functionals, PBE, TPSS, wb97m-v
).density_fit() # use density fitting

mf.grids.atom_grid = (99,590) # Set up Lebedev grids
mf.conv_tol = 1e-10 # SCF convergence tolerance
mf.max_cycle = 50 # max number of SCF iteractions

e_dft = mf.kernel() # compute total energy

g = mf.nuc_grad_method() # create a gradient object
g_dft = g.kernel() # compute analytical gradient

h = mf.Hessian() # create a Hessian object
h_dft = h.kernel() # compute analytical Hessian

output file: pyscf.log
/opt/mamba/lib/python3.10/site-packages/pyscf/dft/libxc.py:1110: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py
  warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, '
/opt/mamba/lib/python3.10/site-packages/cupy/cuda/compiler.py:233: PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...
  jitify._init_module()
代码
文本

to_gpu && to_cpu

to_gpu convert a PySCF object into a GPU4PySCF object.

to_cpu convert a GPU4PySCF object into PySCF object.

Note: not all PySCF classes have to_gpu method implemented. Please checkout the github repo for supported functionalities.

代码
文本
[4]
import pyscf
from pyscf.dft import rks

atom ='''
O 0.0000000000 -0.0000000000 0.1174000000
H -0.7570000000 -0.0000000000 -0.4696000000
H 0.7570000000 0.0000000000 -0.4696000000
'''

mol = pyscf.M(atom=atom, basis='def2-tzvpp')
mf = rks.RKS(mol, xc='LDA').density_fit().to_gpu() # move PySCF object to GPU4PySCF object
e_dft = mf.kernel() # compute total energy

converged SCF energy = -75.2427927513248
代码
文本

Now the GPU task is done. We are going back to CPU and apply the method available on PySCF.

代码
文本
[5]
mf = mf.to_cpu()
mf.verbose = 3
mf.analyze()
**** MO energy ****
MO #1   energy= -18.5309130361177  occ= 2
MO #2   energy= -0.867406722063318 occ= 2
MO #3   energy= -0.430552861990662 occ= 2
MO #4   energy= -0.287810535694673 occ= 2
MO #5   energy= -0.213845863536424 occ= 2
MO #6   energy= 0.0279360174058646 occ= 0
MO #7   energy= 0.103970833518282  occ= 0
MO #8   energy= 0.365852236362224  occ= 0
MO #9   energy= 0.401817990005324  occ= 0
MO #10  energy= 0.417092793857731  occ= 0
MO #11  energy= 0.427089684574495  occ= 0
MO #12  energy= 0.533162356854433  occ= 0
MO #13  energy= 0.559423477534688  occ= 0
MO #14  energy= 0.642324854221829  occ= 0
MO #15  energy= 0.671254075360752  occ= 0
MO #16  energy= 0.763571220143496  occ= 0
MO #17  energy= 0.997977782675208  occ= 0
MO #18  energy= 1.15599649400448   occ= 0
MO #19  energy= 1.21701454122243   occ= 0
MO #20  energy= 1.69490924488968   occ= 0
MO #21  energy= 1.73123719335389   occ= 0
MO #22  energy= 1.76304918092714   occ= 0
MO #23  energy= 1.79613298056702   occ= 0
MO #24  energy= 1.8517827395061    occ= 0
MO #25  energy= 2.06436606632415   occ= 0
MO #26  energy= 2.33338168329192   occ= 0
MO #27  energy= 2.48177986111836   occ= 0
MO #28  energy= 2.66086177703876   occ= 0
MO #29  energy= 3.02456037955242   occ= 0
MO #30  energy= 3.12254899147704   occ= 0
MO #31  energy= 3.23429663064129   occ= 0
MO #32  energy= 3.2351676434303    occ= 0
MO #33  energy= 3.259149997086     occ= 0
MO #34  energy= 3.34031726662165   occ= 0
MO #35  energy= 3.49577379650828   occ= 0
MO #36  energy= 3.55518865554412   occ= 0
MO #37  energy= 3.60974761183145   occ= 0
MO #38  energy= 3.6151833755771    occ= 0
MO #39  energy= 3.96421035205043   occ= 0
MO #40  energy= 4.05321846213409   occ= 0
MO #41  energy= 4.05517739148084   occ= 0
MO #42  energy= 4.12473981358761   occ= 0
MO #43  energy= 4.56056537756262   occ= 0
MO #44  energy= 4.58435732098983   occ= 0
MO #45  energy= 4.73384117166823   occ= 0
MO #46  energy= 4.91648601245221   occ= 0
MO #47  energy= 5.54141548172266   occ= 0
MO #48  energy= 6.08296384212195   occ= 0
MO #49  energy= 6.32455220739404   occ= 0
MO #50  energy= 6.33707844352242   occ= 0
MO #51  energy= 6.46267317107576   occ= 0
MO #52  energy= 6.49223865377348   occ= 0
MO #53  energy= 6.50236887302774   occ= 0
MO #54  energy= 6.71797444670698   occ= 0
MO #55  energy= 6.78370175375664   occ= 0
MO #56  energy= 7.30334068696683   occ= 0
MO #57  energy= 7.33175830779937   occ= 0
MO #58  energy= 7.5679760373653    occ= 0
MO #59  energy= 43.436205078353    occ= 0
 ** Mulliken atomic charges  **
charge of    0O =     -0.67873
charge of    1H =      0.33936
charge of    2H =      0.33936
Dipole moment(X, Y, Z, Debye):  0.00000,  0.00000, -1.94381
((array([ 1.99980742e+00,  1.64073176e+00,  5.49375332e-03,  8.70030710e-04,
          5.08039632e-05,  1.34854233e+00,  1.97792842e+00,  1.67232550e+00,
          5.95250816e-03,  9.95958234e-03,  3.58036226e-03,  1.59530251e-07,
          2.17586562e-06,  2.76104485e-04,  1.25119538e-20,  2.49017054e-03,
          2.97840701e-03,  4.33219390e-03,  1.76577009e-03, -3.19599622e-22,
          1.47241434e-04,  2.15374727e-04,  8.87569993e-05,  1.15468095e-04,
          1.99714516e-04,  1.09271369e-21,  2.73351493e-05,  5.82359286e-05,
          4.11414426e-06,  5.99425750e-04,  1.84269929e-04,  6.41352917e-01,
          6.14327280e-03,  4.87674998e-03,  6.44484767e-04,  4.60035700e-03,
          2.30830273e-03,  1.42970984e-04,  2.58059627e-07,  3.58278604e-04,
          3.70260005e-06,  1.83610566e-05,  1.29115343e-05,  4.93534685e-05,
          1.24381905e-04,  6.41352917e-01,  6.14327280e-03,  4.87674998e-03,
          6.44484767e-04,  4.60035700e-03,  2.30830273e-03,  1.42970984e-04,
          2.58059627e-07,  3.58278604e-04,  3.70260005e-06,  1.83610566e-05,
          1.29115343e-05,  4.93534685e-05,  1.24381905e-04]),
  array([-0.67872739,  0.3393637 ,  0.3393637 ])),
 array([ 1.14543145e-13,  4.21425648e-14, -1.94381215e+00]))
代码
文本

Performance of GPU4PySCF on T4

Now we use a realistic example (Vitamin C molecule) to show the speedup of GPU4PySCF over PySCF on CPU.

Here is the geometry of Vitamin C molecule.

代码
文本
[6]
atom = '''
C -0.07551087 1.68127663 -0.10745193
O 1.33621755 1.87147409 -0.39326987
C 1.67074668 2.95729545 0.49387976
C 0.41740763 3.77281969 0.78495878
C -0.60481480 3.07572636 0.28906224
H -0.19316298 1.01922455 0.72486113
O 0.35092043 5.03413298 1.45545728
H 0.42961487 5.74279041 0.81264173
O -1.95331750 3.53349874 0.15912025
H -2.55333895 2.78846397 0.23972698
O 2.81976302 3.20110148 0.94542226
C -0.81772499 1.09230218 -1.32146482
H -0.70955636 1.74951833 -2.15888136
C -2.31163857 0.93420736 -0.98260166
H -2.72575463 1.89080093 -0.74107186
H -2.41980721 0.27699120 -0.14518512
O -0.26428017 -0.18613595 -1.64425697
H -0.72695910 -0.55328886 -2.40104423
O -3.00083741 0.38730252 -2.10989934
H -3.93210821 0.28874990 -1.89865997
'''
代码
文本

Let us run SCF calculation and gradient calculation on CPU first. It will take about 6 mins. We take a break and record the wall clock time.

代码
文本
[7]
import pyscf
import time
from pyscf.dft import rks, uks

start_time = time.time()
mol = pyscf.M(atom=atom, basis='def2-tzvpp',verbose=1)

mf = rks.RKS(mol, xc='b3lyp').density_fit()
mf.verbose = 1

mf.grids.atom_grid = (99,590)
mf.conv_tol = 1e-10
mf.chkfile = None
e_tot = mf.kernel()
scf_time = time.time() - start_time
print(f'compute time for energy: {scf_time:.3f} s')

start_time = time.time()
g = mf.nuc_grad_method()
g.auxbasis_response = True
f = g.kernel()
grad_time = time.time() - start_time
print(f'compute time for gradient: {grad_time:.3f} s')

compute time for energy: 214.773 s
代码
文本

Now let us run the same tasks on GPU. It will take about half min.

代码
文本
[ ]
import pyscf
import time
from gpu4pyscf.dft import rks, uks

start_time = time.time()
mol = pyscf.M(atom=atom, basis='def2-tzvpp',verbose=1)

mf = rks.RKS(mol, xc='b3lyp').density_fit()
mf.verbose = 1

mf.grids.atom_grid = (99,590)
mf.conv_tol = 1e-10
mf.chkfile = None
e_tot = mf.kernel()
scf_time = time.time() - start_time
print(f'compute time for energy: {scf_time:.3f} s')

start_time = time.time()
g = mf.nuc_grad_method()
g.auxbasis_response = True
f = g.kernel()
grad_time = time.time() - start_time
print(f'compute time for gradient: {grad_time:.3f} s')

代码
文本

It is already 10x faster, although Nvidia Tesla T4 has only 250 GFLOPS for FP64! The more powerful GPUs such as A100 and H100 have 19.5 and 67 TFLOPS for FP64, respectively. You can run GPU4PySCF on those powerful GPUs for better efficiency.

代码
文本

Some Useful Examples

代码
文本

We take water molecule as an example and import common modules.

代码
文本
[ ]
water ='''
O 0.0000000000 -0.0000000000 0.1174000000
H -0.7570000000 -0.0000000000 -0.4696000000
H 0.7570000000 0.0000000000 -0.4696000000
'''
from gpu4pyscf.dft import rks
import pyscf
mol = pyscf.M(atom=water, basis='def2-tzvpp')
代码
文本

Dispersion Correction & Nonlocal Correction

代码
文本

B3LYP functional with density fitting and D3(BJ) dispersion correction

代码
文本
[ ]
mf = rks.RKS(mol, xc='b3lyp').density_fit()
mf.disp = 'd3bj'
mf.kernel()
代码
文本

ωB97m functional with density fitting and vv10 nonlocal correction

代码
文本
[ ]
mf = rks.RKS(mol, xc='wb97m-v').density_fit()
mf.kernel()
代码
文本

Geometry Optimization & Transition State Search

代码
文本

We use geomeTRIC for geometry optimization and transition state search. geomeTRIC is already installed as dependency.

代码
文本
[ ]
from pyscf.geomopt.geometric_solver import optimize
mf = rks.RKS(mol, xc='b3lyp').density_fit()
mf.disp = 'd3bj'
mol_eq = optimize(mf, maxsteps=50)
print("Optimized equilibrium state:")
print(mol_eq.atom_coords())

mol_ts = optimize(mf, maxsteps=50, transition=True)
print("Optimized transition state:")
print(mol_ts.atom_coords())
代码
文本

Solvation Free Energy using SMD model

代码
文本
[ ]
mf = rks.RKS(mol, xc='B3LYP').density_fit()
e_gas = mf.kernel()
mf = mf.SMD()
mf.with_solvent.solvent = 'water'
e_sol = mf.kernel()
print('Solvation Free Energy:', e_sol - e_gas, 'Hartree')
代码
文本

Open-Shell Calculations

代码
文本
[ ]
mol_open = mol.copy()
mol_open.charge = 1
mol_open.spin = 1
mf = uks.UKS(mol_open, xc='b3lyp').density_fit()
mf.kernel()
代码
文本

Density Fitting MP2

代码
文本
[ ]
from gpu4pyscf import scf, mp
mf = scf.RHF(mol).density_fit()
e_hf = mf.kernel()

ptobj = mp.dfmp2.DFMP2(mf)
e_corr, t2 = ptobj.kernel()
e_mp2 = e_hf + e_corr
代码
文本
[ ]

代码
文本
量子化学
计算化学
电子结构
gpu
DFT
python
量子化学计算化学电子结构gpuDFTpython
已赞3
推荐阅读
公开
Py4DSTEM Notebooks 学习笔记:导读
AI4Spy4dstemSTEM
AI4Spy4dstemSTEM
Linfeng Zhang
发布于 2023-09-18
1 赞1 转存文件
公开
PKUSC教程(1):Introduction to CUDA Programming: From Correctness to Performance
HPCCUDA
HPCCUDA
北京大学超算队
发布于 2023-09-30
4 赞