Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
NistChempy的使用
NistChempy
python
NistChempypython
CLiu
更新于 2024-11-09
推荐镜像 :Basic Image:ubuntu22.04-py3.10-irkernel-r4.4.1
推荐机型 :c2_m4_cpu
1
使用 NIST Chemistry WebBook 的 nistchempy 教学示例
化合物属性
初始化
属性
基本属性
参考属性
提取属性
数据提取示例
MOL 文件
光谱
[1]
!pip install nistchempy
!pip install rdkit
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting nistchempy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/bc/a1/d81af983832114e23a1cf6ff8ceaa345d49904a72a72762c8aac6bb55b8a/NistChemPy-1.0.2-py3-none-any.whl (10.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.6/10.6 MB 27.1 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: requests in /opt/mamba/lib/python3.10/site-packages (from nistchempy) (2.28.1)
Requirement already satisfied: beautifulsoup4 in /opt/mamba/lib/python3.10/site-packages (from nistchempy) (4.11.2)
Requirement already satisfied: pandas in /opt/mamba/lib/python3.10/site-packages (from nistchempy) (1.5.3)
Requirement already satisfied: soupsieve>1.2 in /opt/mamba/lib/python3.10/site-packages (from beautifulsoup4->nistchempy) (2.4)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/mamba/lib/python3.10/site-packages (from pandas->nistchempy) (2.8.2)
Requirement already satisfied: numpy>=1.21.0 in /opt/mamba/lib/python3.10/site-packages (from pandas->nistchempy) (1.24.2)
Requirement already satisfied: pytz>=2020.1 in /opt/mamba/lib/python3.10/site-packages (from pandas->nistchempy) (2022.7.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/mamba/lib/python3.10/site-packages (from requests->nistchempy) (2022.9.24)
Requirement already satisfied: idna<4,>=2.5 in /opt/mamba/lib/python3.10/site-packages (from requests->nistchempy) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/mamba/lib/python3.10/site-packages (from requests->nistchempy) (1.26.11)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/mamba/lib/python3.10/site-packages (from requests->nistchempy) (2.1.1)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas->nistchempy) (1.16.0)
Installing collected packages: nistchempy
Successfully installed nistchempy-1.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting rdkit
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d2/f3/9125802d1403f56fc6d758dbec3a66fae6ad7023d396ecf5a29af27c78aa/rdkit-2024.3.6-cp310-cp310-manylinux_2_28_x86_64.whl (32.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32.8/32.8 MB 6.1 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: Pillow in /opt/mamba/lib/python3.10/site-packages (from rdkit) (10.4.0)
Requirement already satisfied: numpy in /opt/mamba/lib/python3.10/site-packages (from rdkit) (1.24.2)
Installing collected packages: rdkit
Successfully installed rdkit-2024.3.6
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本

使用 NIST Chemistry WebBook 的 nistchempy 教学示例

本教程将展示如何使用 nistchempy 库来获取化合物的基本属性和光谱数据。我们将使用 NIST Compound ID、CAS 注册号和 InChI 字符串作为示例。

代码
文本

化合物属性

初始化

NIST Chemistry WebBook 化合物可以通过 NIST 化合物 ID、CAS 注册号或 InChI 字符串进行初始化:

代码
文本
[2]
import nistchempy as nist

X = nist.get_compound('C632053')
X
NistCompound(ID=C632053)
代码
文本
[3]
X = nist.get_compound('632-05-3')
X
NistCompound(ID=C632053)
代码
文本
双击即可修改
代码
文本
[4]
X = nist.get_compound('InChI=1S/C4H7Br3/c1-3(6)4(7)2-5/h3-4H,2H2,1H3')
X
NistCompound(ID=C632053)
代码
文本

如果在 NIST Chemistry WebBook 数据库中没有给定标识符的化合物,nist.get_compound 将返回 None。如果多个物质对应给定的 InChI,也会得到相同的结果。

详情可见https://webbook.nist.gov/chemistry/

代码
文本

属性

nist.compound.NistCompound 对象包含从 NIST Chemistry WebBook 的化合物网页提取的信息。可以分为三个组:

代码
文本

基本属性

  • ID: NIST 化合物 ID;
  • name: 化学名称;
  • synonyms: 同义词;
  • formula: 化学式;
  • mol_weight: 分子量;
  • inchi / inchi_key: InChI / InChIKey 字符串;
  • cas_rn: CAS 注册号。
代码
文本

参考属性

参考属性是字典 {属性名称 => URL}。有四个子组:

  • mol_refs: 分子属性,包括 2D 和 3D MOL 文件;
  • data_refs: WebBook 属性,存储在 NIST Chemistry WebBook 中;
  • nist_public_refs: 其他属性,存储在公共 NIST 网站中;
  • nist_subscription_refs: 其他属性,存储在付费 NIST 网站中。
代码
文本

提取属性

提取属性是从参考属性提供的 URL 中提取的属性:

  • mol2D / mol3D: 2D / 3D MOL 文件的文本块;
  • ir_specs / thz_specs / ms_specs / uv_specs: IR / THz / MS / UV 光谱的 JDX 格式文本块。
代码
文本

数据提取示例

代码
文本
[5]
s = nist.run_search('anthracene', 'name')
X = s.compounds[0]
X.__dict__
{'ID': 'C120127',
 'name': 'Anthracene',
 'synonyms': ['Anthracin',
  'Green Oil',
  'Paranaphthalene',
  'Tetra Olive N2G',
  'Anthracene oil',
  'p-Naphthalene',
  'Anthracen',
  'Coal tar pitch volatiles:anthracene',
  'Sterilite hop defoliant'],
 'formula': 'C14H10',
 'mol_weight': 178.2292,
 'inchi': 'InChI=1S/C14H10/c1-2-6-12-10-14-8-4-3-7-13(14)9-11(12)5-1/h1-10H',
 'inchi_key': 'MWPLVEDNUUSJAV-UHFFFAOYSA-N',
 'cas_rn': '120-12-7',
 'mol_refs': {'mol2D': 'https://webbook.nist.gov/cgi/cbook.cgi?Str2File=C120127',
  'mol3D': 'https://webbook.nist.gov/cgi/cbook.cgi?Str3File=C120127'},
 'data_refs': {'cTG': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=1#Thermo-Gas',
  'cTC': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=2#Thermo-Condensed',
  'cTP': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=4#Thermo-Phase',
  'cTR': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=8#Thermo-React',
  'cSO': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=10#Solubility',
  'cIE': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=20#Ion-Energetics',
  'cIC': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=40#Ion-Cluster',
  'cIR': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=80#IR-Spec',
  'cMS': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=200#Mass-Spec',
  'cUV': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=400#UV-Vis-Spec',
  'cGC': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C120127&Units=SI&Mask=2000#Gas-Chrom'},
 'nist_public_refs': {'Gas Phase Kinetics Database': 'https://kinetics.nist.gov/kinetics/rpSearch?cas=120127',
  'X-ray Photoelectron Spectroscopy Database, version 5.0': 'https://srdata.nist.gov/xps/SpectralByCompdDd/21197',
  'NIST Polycyclic Aromatic Hydrocarbon Structure Index': 'https://pah.nist.gov/?q=pah015'},
 'nist_subscription_refs': {'NIST / TRC Web Thermo Tables, "lite" edition (thermophysical and thermochemical data)': 'https://wtt-lite.nist.gov/wtt-lite/index.html?cmp=anthracene',
  'NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)': 'https://wtt-pro.nist.gov/wtt-pro/index.html?cmp=anthracene'},
 'nist_response': NistResponse(ok=True, content_type='text/html; charset=UTF-8'),
 'mol2D': None,
 'mol3D': None,
 'ir_specs': [],
 'thz_specs': [],
 'ms_specs': [],
 'uv_specs': []}
代码
文本

MOL 文件

要加载 MOL 文件,可以使用 get_mol2Dget_mol3Dget_molfiles 方法:

代码
文本
[6]
X.get_molfiles()
代码
文本
[7]
def format_mol2d(mol2d):
# 分割字符串为行
lines = mol2d.strip().split('\r\n')
# 分隔不同部分
header = lines[0] # 第一行是分子信息
copyright_info = lines[1] # 第二行是版权信息
v2000_info = lines[2] # 第三行是 V2000 格式信息
atom_lines = []
# 查找原子坐标部分
for line in lines[3:]:
if line.startswith('M END'):
break # 到达末尾,停止添加原子坐标
atom_lines.append(line)

# 生成格式化输出
formatted_output = []

formatted_output.append("分子信息:")
formatted_output.append(f"- {header}")
formatted_output.append(f"- 版权: {copyright_info}")
formatted_output.append("\n" + "-" * 30 + "\n")

formatted_output.append("V2000格式:")
formatted_output.append(v2000_info)
formatted_output.append("\n" + "-" * 30 + "\n")

formatted_output.append("原子坐标:")
formatted_output.append(f"{len(atom_lines)} 原子数据:") # 添加原子数量说明
formatted_output.extend(atom_lines)
# 去掉键连接部分
# formatted_output.append("\n" + "-" * 30 + "\n")
# formatted_output.append("键连接:")
# formatted_output.extend(bond_lines)

formatted_output.append("\nM END")

return "\n".join(formatted_output)

# 示例数据
mol2d_example = X.mol2D
# 调用函数并打印格式化结果
formatted_result = format_mol2d(mol2d_example)
print(formatted_result)

分子信息:
- Anthracene, ID: C120127
- 版权:   NIST    24110905222D 1   1.00000     0.00000      

------------------------------

V2000格式:
Copyright by the U.S. Sec. Commerce on behalf of U.S.A. All rights reserved.

------------------------------

原子坐标:
31 原子数据:
 14 16  0     0  0              1 V2000
    0.0000    1.4838    0.0000 C   0  0  0  0  0  0           0  0  0
    0.0000    0.5117    0.0000 C   0  0  0  0  0  0           0  0  0
    0.8698    1.9955    0.0000 C   0  0  0  0  0  0           0  0  0
    0.8698    0.0000    0.0000 C   0  0  0  0  0  0           0  0  0
    1.7397    0.5117    0.0000 C   0  0  0  0  0  0           0  0  0
    1.7397    1.4838    0.0000 C   0  0  0  0  0  0           0  0  0
    2.5583    1.9955    0.0000 C   0  0  0  0  0  0           0  0  0
    2.5583    0.0000    0.0000 C   0  0  0  0  0  0           0  0  0
    3.4793    0.5117    0.0000 C   0  0  0  0  0  0           0  0  0
    3.4793    1.4838    0.0000 C   0  0  0  0  0  0           0  0  0
    4.3492    1.9955    0.0000 C   0  0  0  0  0  0           0  0  0
    4.3492    0.0000    0.0000 C   0  0  0  0  0  0           0  0  0
    5.2190    0.5117    0.0000 C   0  0  0  0  0  0           0  0  0
    5.2190    1.4838    0.0000 C   0  0  0  0  0  0           0  0  0
  2  1  2  0     0  0
  1  3  1  0     0  0
  4  2  1  0     0  0
  3  6  2  0     0  0
  5  4  2  0     0  0
  5  6  1  0     0  0
  8  5  1  0     0  0
  6  7  1  0     0  0
  7 10  2  0     0  0
  9  8  2  0     0  0
  9 10  1  0     0  0
 12  9  1  0     0  0
 10 11  1  0     0  0
 11 14  2  0     0  0
 13 12  2  0     0  0
 14 13  1  0     0  0

M  END
代码
文本
[8]
from rdkit import Chem

mol = Chem.MolFromMolBlock(X.mol2D)
mol

代码
文本

光谱

要加载光谱,可以使用 get_ir_spectraget_thz_spectraget_ms_spectraget_uv_spectraget_all_spectra 方法:

代码
文本
[9]
X.ir_specs, X.thz_specs, X.ms_specs, X.uv_specs
([], [], [], [])
代码
文本
[10]
X.get_ms_spectra()
X.ir_specs, X.thz_specs, X.ms_specs, X.uv_specs

([], [], [Spectrum(C120127, Mass spectrum #0)], [])
代码
文本

Spectrum 对象包含光谱的 JDX 格式文本块,包括元信息和光谱数据:

代码
文本
[11]
ms = X.ms_specs[0]
print(ms.jdx_text)

##TITLE=Anthracene
##JCAMP-DX=4.24
##DATA TYPE=MASS SPECTRUM
##ORIGIN=Japan AIST/NIMC Database- Spectrum MS-NW- 132
##OWNER=NIST Mass Spectrometry Data Center
Collection (C) 2014 copyright by the U.S. Secretary of Commerce
on behalf of the United States of America. All rights reserved.
##CAS REGISTRY NO=120-12-7
##$NIST MASS SPEC NO=228201
##MOLFORM=C14 H10
##MW=178
##$NIST SOURCE=MSDC
##XUNITS=M/Z
##YUNITS=RELATIVE INTENSITY
##XFACTOR=1
##YFACTOR=1
##FIRSTX=27
##LASTX=181
##FIRSTY=20
##MAXX=181
##MINX=27
##MAXY=9999
##MINY=10
##NPOINTS=62
##PEAK TABLE=(XY..XY)
27,20 28,10 38,30 39,109
50,129 51,129 52,30 61,40
62,129 63,289 64,20 65,20
69,20 73,10 74,219 75,299
76,619 77,80 78,10 83,50
85,30 86,99 87,169 88,439
89,759 90,10 98,119 99,90
100,50 101,50 102,60 110,40
111,50 113,60 114,20 115,50
122,40 123,20 124,20 125,50
126,149 127,60 128,80 137,30
138,30 139,209 140,80 149,70
150,419 151,629 152,689 153,80
163,50 164,20 174,129 175,199
176,1409 177,799 178,9999 179,1569
180,149 181,30
##END=

代码
文本
[29]
import matplotlib.pyplot as plt
from matplotlib import rcParams

# Set the font family to 'DejaVu Sans' which is included with Matplotlib
rcParams['font.family'] = 'DejaVu Sans'

# Rest of your code...

# Assume `ms.jdx_text` is a string containing mass spectrometry data in JCAMP-DX format
jdx_text = ms.jdx_text.splitlines() # Split the text into lines

# Find the line index where "##PEAK TABLE" starts
try:
start_index = next(i for i, line in enumerate(jdx_text) if line.startswith("##PEAK TABLE")) + 1
peak_data = jdx_text[start_index:] # Extract lines after "##PEAK TABLE"
except StopIteration:
print("No peak table data found. Please check the input file format.")
peak_data = []

# Convert data into m/z and relative intensity values
mz_values = []
intensity_values = []

# Check if peak_data is not empty
if peak_data:
for line in peak_data:
peaks = line.split() # Split multiple peaks in a line
for peak in peaks:
try:
mz, intensity = map(float, peak.split(","))
mz_values.append(mz)
intensity_values.append(intensity)
except ValueError:
if peak.strip() == "##END=":
continue # Ignore END marker
print(f"Unable to parse peak data: {peak}")

# Get maximum and minimum intensity values
if intensity_values:
max_intensity = max(intensity_values)
min_intensity = min(intensity_values)
print(f"Max Intensity: {max_intensity}")
print(f"Min Intensity: {min_intensity}")

# Plot the mass spectrum
plt.figure(figsize=(10, 6))
plt.bar(mz_values, intensity_values, width=0.5, color='blue', edgecolor='black')
plt.title("Mass Spectrum of Anthracene")
plt.xlabel("m/z (Mass-to-Charge Ratio)")
plt.ylabel("Relative Intensity")
plt.show()
else:
print("No valid intensity data found.")
else:
print("Peak table data is empty; cannot plot mass spectrum.")

Max Intensity: 9999.0
Min Intensity: 10.0
代码
文本
NistChempy
python
NistChempypython
点个赞吧
{/**/}