空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

GPU4PySCF QuickStart

量子化学

计算化学

电子结构

gpu

DFT

python

量子化学计算化学电子结构gpuDFTpython

Xiaojie Wu

更新于 2024-09-04

推荐镜像 :Basic Image:ubuntu:22.04-py3.10-cuda12.1

推荐机型 :c4_m15_1 * NVIDIA T4

Prerequisite

Check GPU availability

Install GPU4PySCF from PyPI

Quick Start

Basics of PySCF

to_gpu && to_cpu

Performance of GPU4PySCF on T4

Some Useful Examples

Dispersion Correction & Nonlocal Correction

Geometry Optimization & Transition State Search

Solvation Free Energy using SMD model

Open-Shell Calculations

Density Fitting MP2

GPU4PySCF is a GPU plugin for PySCF, focusing on performance and industrial applications.

GitHub Repo (https://github.com/pyscf/gpu4pyscf)

This notebook is created by Xiaojie Wu (wxj6000@gmail.com)

代码

文本

Prerequisite

Basic knowledge of molecular simulations and fundamentals of PySCF are required for this notebook.

We expect the following examples running the environment with

CUDA 12.x
Python3
Tesla T4, 16 GB memory
CuPy >= 13.0
GPU4PySCF >= 1.0.0
PySCF >= 2.6.0

代码

文本

Check GPU availability

Please make sure you have GPU connected, before executing the following tasks.

代码

文本

[ ]

!nvidia-smi

Wed Jun  5 02:22:51 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

代码

文本

We have Tesla T4 ready and CUDA driver installed.

代码

文本

Install GPU4PySCF from PyPI

GPU4PySCF are released with two packages on PyPI: \ gpu4pyscf-cuda11x for CUDA 11.x and \ gpu4pyscf-cuda12x for CUDA 12.x.

Our current CUDA environment is CUDA 12.x. Now we install the latest version.

代码

文本

[2]

!pip3 install gpu4pyscf-cuda12x

!pip3 install cutensor-cu12

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: gpu4pyscf-cuda12x in /opt/mamba/lib/python3.10/site-packages (1.0)
Requirement already satisfied: pyscf~=2.6.0 in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (2.6.2)
Requirement already satisfied: geometric in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (1.0.2)
Requirement already satisfied: gpu4pyscf-libxc-cuda12x in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (0.4)
Requirement already satisfied: cupy-cuda12x in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (13.2.0)
Requirement already satisfied: pyscf-dispersion in /opt/mamba/lib/python3.10/site-packages (from gpu4pyscf-cuda12x) (1.1.0)
Requirement already satisfied: scipy!=1.5.0,!=1.5.1 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (1.11.4)
Requirement already satisfied: numpy!=1.16,!=1.17,>=1.13 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (1.26.2)
Requirement already satisfied: h5py>=2.7 in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (3.10.0)
Requirement already satisfied: setuptools in /opt/mamba/lib/python3.10/site-packages (from pyscf~=2.6.0->gpu4pyscf-cuda12x) (65.5.0)
Requirement already satisfied: fastrlock>=0.5 in /opt/mamba/lib/python3.10/site-packages (from cupy-cuda12x->gpu4pyscf-cuda12x) (0.8.2)
Requirement already satisfied: networkx in /opt/mamba/lib/python3.10/site-packages (from geometric->gpu4pyscf-cuda12x) (3.3)
Requirement already satisfied: six in /opt/mamba/lib/python3.10/site-packages (from geometric->gpu4pyscf-cuda12x) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting cutensor-cu12
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ed/d6/61fc3511bc9e4cdb423b69964e3d344090b4093cbf9d3c8cc469ef4642d0/cutensor_cu12-2.0.2-py3-none-manylinux2014_x86_64.whl (156.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.9/156.9 MB 2.2 MB/s eta 0:00:0000:0100:01
Installing collected packages: cutensor-cu12
Successfully installed cutensor-cu12-2.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

代码

文本

Quick Start

代码

文本

Basics of PySCF

GPU4PySCF has the same syntax as PySCF. Let us take a simple example of water molecule to show some basics of PySCF. We calculate SCF, analytical gradient, and analytical Hessian with density fitting. We recommend the PySCF Documentation for the detailed APIs.

代码

文本

[3]

import pyscf

from gpu4pyscf.dft import rks

atom ='''

O 0.0000000000 -0.0000000000 0.1174000000

H -0.7570000000 -0.0000000000 -0.4696000000

H 0.7570000000 0.0000000000 -0.4696000000

'''

mol = pyscf.M(atom=atom, # can a string, list, or xyz filename

charge=0, # assign total charge

spin=None, # if spin = None, spin = # of electrons %2

basis='def2-tzvpp', # basis set

verbose=1, # control print info

output='pyscf.log' # log file

)

mf = rks.RKS(mol,

xc='b3lyp' # xc functionals, PBE, TPSS, wb97m-v

).density_fit() # use density fitting

mf.grids.atom_grid = (99,590) # Set up Lebedev grids

mf.conv_tol = 1e-10 # SCF convergence tolerance

mf.max_cycle = 50 # max number of SCF iteractions

e_dft = mf.kernel() # compute total energy

g = mf.nuc_grad_method() # create a gradient object

g_dft = g.kernel() # compute analytical gradient

h = mf.Hessian() # create a Hessian object

h_dft = h.kernel() # compute analytical Hessian

output file: pyscf.log
/opt/mamba/lib/python3.10/site-packages/pyscf/dft/libxc.py:1110: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py
  warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, '
/opt/mamba/lib/python3.10/site-packages/cupy/cuda/compiler.py:233: PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...
  jitify._init_module()

代码

文本

`to_gpu` && `to_cpu`

to_gpu convert a PySCF object into a GPU4PySCF object.

to_cpu convert a GPU4PySCF object into PySCF object.

Note: not all PySCF classes have to_gpu method implemented. Please checkout the github repo for supported functionalities.

代码

文本

[4]

import pyscf

from pyscf.dft import rks

atom ='''

O 0.0000000000 -0.0000000000 0.1174000000

H -0.7570000000 -0.0000000000 -0.4696000000

H 0.7570000000 0.0000000000 -0.4696000000

'''

mol = pyscf.M(atom=atom, basis='def2-tzvpp')

mf = rks.RKS(mol, xc='LDA').density_fit().to_gpu() # move PySCF object to GPU4PySCF object

e_dft = mf.kernel() # compute total energy

converged SCF energy = -75.2427927513248

代码

文本

Now the GPU task is done. We are going back to CPU and apply the method available on PySCF.

代码

文本

[5]

mf = mf.to_cpu()

mf.verbose = 3

mf.analyze()

**** MO energy ****
MO #1   energy= -18.5309130361177  occ= 2
MO #2   energy= -0.867406722063318 occ= 2
MO #3   energy= -0.430552861990662 occ= 2
MO #4   energy= -0.287810535694673 occ= 2
MO #5   energy= -0.213845863536424 occ= 2
MO #6   energy= 0.0279360174058646 occ= 0
MO #7   energy= 0.103970833518282  occ= 0
MO #8   energy= 0.365852236362224  occ= 0
MO #9   energy= 0.401817990005324  occ= 0
MO #10  energy= 0.417092793857731  occ= 0
MO #11  energy= 0.427089684574495  occ= 0
MO #12  energy= 0.533162356854433  occ= 0
MO #13  energy= 0.559423477534688  occ= 0
MO #14  energy= 0.642324854221829  occ= 0
MO #15  energy= 0.671254075360752  occ= 0
MO #16  energy= 0.763571220143496  occ= 0
MO #17  energy= 0.997977782675208  occ= 0
MO #18  energy= 1.15599649400448   occ= 0
MO #19  energy= 1.21701454122243   occ= 0
MO #20  energy= 1.69490924488968   occ= 0
MO #21  energy= 1.73123719335389   occ= 0
MO #22  energy= 1.76304918092714   occ= 0
MO #23  energy= 1.79613298056702   occ= 0
MO #24  energy= 1.8517827395061    occ= 0
MO #25  energy= 2.06436606632415   occ= 0
MO #26  energy= 2.33338168329192   occ= 0
MO #27  energy= 2.48177986111836   occ= 0
MO #28  energy= 2.66086177703876   occ= 0
MO #29  energy= 3.02456037955242   occ= 0
MO #30  energy= 3.12254899147704   occ= 0
MO #31  energy= 3.23429663064129   occ= 0
MO #32  energy= 3.2351676434303    occ= 0
MO #33  energy= 3.259149997086     occ= 0
MO #34  energy= 3.34031726662165   occ= 0
MO #35  energy= 3.49577379650828   occ= 0
MO #36  energy= 3.55518865554412   occ= 0
MO #37  energy= 3.60974761183145   occ= 0
MO #38  energy= 3.6151833755771    occ= 0
MO #39  energy= 3.96421035205043   occ= 0
MO #40  energy= 4.05321846213409   occ= 0
MO #41  energy= 4.05517739148084   occ= 0
MO #42  energy= 4.12473981358761   occ= 0
MO #43  energy= 4.56056537756262   occ= 0
MO #44  energy= 4.58435732098983   occ= 0
MO #45  energy= 4.73384117166823   occ= 0
MO #46  energy= 4.91648601245221   occ= 0
MO #47  energy= 5.54141548172266   occ= 0
MO #48  energy= 6.08296384212195   occ= 0
MO #49  energy= 6.32455220739404   occ= 0
MO #50  energy= 6.33707844352242   occ= 0
MO #51  energy= 6.46267317107576   occ= 0
MO #52  energy= 6.49223865377348   occ= 0
MO #53  energy= 6.50236887302774   occ= 0
MO #54  energy= 6.71797444670698   occ= 0
MO #55  energy= 6.78370175375664   occ= 0
MO #56  energy= 7.30334068696683   occ= 0
MO #57  energy= 7.33175830779937   occ= 0
MO #58  energy= 7.5679760373653    occ= 0
MO #59  energy= 43.436205078353    occ= 0
 ** Mulliken atomic charges  **
charge of    0O =     -0.67873
charge of    1H =      0.33936
charge of    2H =      0.33936
Dipole moment(X, Y, Z, Debye):  0.00000,  0.00000, -1.94381

((array([ 1.99980742e+00,  1.64073176e+00,  5.49375332e-03,  8.70030710e-04,
          5.08039632e-05,  1.34854233e+00,  1.97792842e+00,  1.67232550e+00,
          5.95250816e-03,  9.95958234e-03,  3.58036226e-03,  1.59530251e-07,
          2.17586562e-06,  2.76104485e-04,  1.25119538e-20,  2.49017054e-03,
          2.97840701e-03,  4.33219390e-03,  1.76577009e-03, -3.19599622e-22,
          1.47241434e-04,  2.15374727e-04,  8.87569993e-05,  1.15468095e-04,
          1.99714516e-04,  1.09271369e-21,  2.73351493e-05,  5.82359286e-05,
          4.11414426e-06,  5.99425750e-04,  1.84269929e-04,  6.41352917e-01,
          6.14327280e-03,  4.87674998e-03,  6.44484767e-04,  4.60035700e-03,
          2.30830273e-03,  1.42970984e-04,  2.58059627e-07,  3.58278604e-04,
          3.70260005e-06,  1.83610566e-05,  1.29115343e-05,  4.93534685e-05,
          1.24381905e-04,  6.41352917e-01,  6.14327280e-03,  4.87674998e-03,
          6.44484767e-04,  4.60035700e-03,  2.30830273e-03,  1.42970984e-04,
          2.58059627e-07,  3.58278604e-04,  3.70260005e-06,  1.83610566e-05,
          1.29115343e-05,  4.93534685e-05,  1.24381905e-04]),
  array([-0.67872739,  0.3393637 ,  0.3393637 ])),
 array([ 1.14543145e-13,  4.21425648e-14, -1.94381215e+00]))

代码

文本

Performance of GPU4PySCF on T4

Now we use a realistic example (Vitamin C molecule) to show the speedup of GPU4PySCF over PySCF on CPU.

Here is the geometry of Vitamin C molecule.

代码

文本

[6]

atom = '''

C -0.07551087 1.68127663 -0.10745193

O 1.33621755 1.87147409 -0.39326987

C 1.67074668 2.95729545 0.49387976

C 0.41740763 3.77281969 0.78495878

C -0.60481480 3.07572636 0.28906224

H -0.19316298 1.01922455 0.72486113

O 0.35092043 5.03413298 1.45545728

H 0.42961487 5.74279041 0.81264173

O -1.95331750 3.53349874 0.15912025

H -2.55333895 2.78846397 0.23972698

O 2.81976302 3.20110148 0.94542226

C -0.81772499 1.09230218 -1.32146482

H -0.70955636 1.74951833 -2.15888136

C -2.31163857 0.93420736 -0.98260166

H -2.72575463 1.89080093 -0.74107186

H -2.41980721 0.27699120 -0.14518512

O -0.26428017 -0.18613595 -1.64425697

H -0.72695910 -0.55328886 -2.40104423

O -3.00083741 0.38730252 -2.10989934

H -3.93210821 0.28874990 -1.89865997

'''

代码

文本

Let us run SCF calculation and gradient calculation on CPU first. It will take about 6 mins. We take a break and record the wall clock time.

代码

文本

[7]

import pyscf

import time

from pyscf.dft import rks, uks

start_time = time.time()

mol = pyscf.M(atom=atom, basis='def2-tzvpp',verbose=1)

mf = rks.RKS(mol, xc='b3lyp').density_fit()

mf.verbose = 1

mf.grids.atom_grid = (99,590)

mf.conv_tol = 1e-10

mf.chkfile = None

e_tot = mf.kernel()

scf_time = time.time() - start_time

print(f'compute time for energy: {scf_time:.3f} s')

start_time = time.time()

g = mf.nuc_grad_method()

g.auxbasis_response = True

f = g.kernel()

grad_time = time.time() - start_time

print(f'compute time for gradient: {grad_time:.3f} s')

compute time for energy: 214.773 s

代码

文本

Now let us run the same tasks on GPU. It will take about half min.

代码

文本

[ ]

import pyscf

import time

from gpu4pyscf.dft import rks, uks

start_time = time.time()

mol = pyscf.M(atom=atom, basis='def2-tzvpp',verbose=1)

mf = rks.RKS(mol, xc='b3lyp').density_fit()

mf.verbose = 1

mf.grids.atom_grid = (99,590)

mf.conv_tol = 1e-10

mf.chkfile = None

e_tot = mf.kernel()

scf_time = time.time() - start_time

print(f'compute time for energy: {scf_time:.3f} s')

start_time = time.time()

g = mf.nuc_grad_method()

g.auxbasis_response = True

f = g.kernel()

grad_time = time.time() - start_time

print(f'compute time for gradient: {grad_time:.3f} s')

代码

文本

It is already 10x faster, although Nvidia Tesla T4 has only 250 GFLOPS for FP64! The more powerful GPUs such as A100 and H100 have 19.5 and 67 TFLOPS for FP64, respectively. You can run GPU4PySCF on those powerful GPUs for better efficiency.

代码

文本

Some Useful Examples

代码

文本

We take water molecule as an example and import common modules.

代码

文本

[ ]

water ='''

O 0.0000000000 -0.0000000000 0.1174000000

H -0.7570000000 -0.0000000000 -0.4696000000

H 0.7570000000 0.0000000000 -0.4696000000

'''

from gpu4pyscf.dft import rks

import pyscf

mol = pyscf.M(atom=water, basis='def2-tzvpp')

代码

文本

Dispersion Correction & Nonlocal Correction

代码

文本

B3LYP functional with density fitting and D3(BJ) dispersion correction

代码

文本

[ ]

mf = rks.RKS(mol, xc='b3lyp').density_fit()

mf.disp = 'd3bj'

mf.kernel()

代码

文本

ωB97m functional with density fitting and vv10 nonlocal correction

代码

文本

[ ]

mf = rks.RKS(mol, xc='wb97m-v').density_fit()

mf.kernel()

代码

文本

Geometry Optimization & Transition State Search

代码

文本

We use geomeTRIC for geometry optimization and transition state search. geomeTRIC is already installed as dependency.

代码

文本

[ ]

from pyscf.geomopt.geometric_solver import optimize

mf = rks.RKS(mol, xc='b3lyp').density_fit()

mf.disp = 'd3bj'

mol_eq = optimize(mf, maxsteps=50)

print("Optimized equilibrium state:")

print(mol_eq.atom_coords())

mol_ts = optimize(mf, maxsteps=50, transition=True)

print("Optimized transition state:")

print(mol_ts.atom_coords())

代码

文本

Solvation Free Energy using SMD model

代码

文本

[ ]

mf = rks.RKS(mol, xc='B3LYP').density_fit()

e_gas = mf.kernel()

mf = mf.SMD()

mf.with_solvent.solvent = 'water'

e_sol = mf.kernel()

print('Solvation Free Energy:', e_sol - e_gas, 'Hartree')

代码

文本

Open-Shell Calculations

代码

文本

[ ]

mol_open = mol.copy()

mol_open.charge = 1

mol_open.spin = 1

mf = uks.UKS(mol_open, xc='b3lyp').density_fit()

mf.kernel()

代码

文本

Density Fitting MP2

代码

文本

[ ]

from gpu4pyscf import scf, mp

mf = scf.RHF(mol).density_fit()

e_hf = mf.kernel()

ptobj = mp.dfmp2.DFMP2(mf)

e_corr, t2 = ptobj.kernel()

e_mp2 = e_hf + e_corr

代码

文本

[ ]

代码

文本

量子化学

计算化学

电子结构

gpu

DFT

python

量子化学计算化学电子结构gpuDFTpython

已赞3