探究
实验室
计算
公开
基团贡献法
化学信息学与智能产品工程
化学信息学与智能产品工程
张磊
liuqilei@dlut.edu.cn
更新于 2025-04-15
推荐镜像 :leiz-dlut:chem
推荐机型 :c2_m4_cpu
赞 1
2
1. 基团分割算法
导入RDKit库
定义分子子结构搜索、合并、统计函数。
定义分子基团分割函数,输出该分子基团分割结果
测试函数
2. 基团贡献法计算分子性质
定义读取基团贡献值数据函数
定义性质预测值范围
相对分子量
熔点
沸点
临界温度
临界压力
临界体积
标准摩尔生成吉布斯函数
标准摩尔生成焓
汽化焓(298K)
熔化焓
汽化焓(Tb)
Hildebrand溶解度
闪点
表面张力(298K)
Hansen色散溶解度参数
Hansen极化溶解度参数
Hansen氢键溶解度参数
黏度
偏心因子
液相摩尔体积(298K)
LC50​ (Fathead Minnow 96-hr)
LC50​ (Daphnia Magna 48-hr)
扩散系数
热导率
饱和蒸气压
密度
溶解度系数
定义基团贡献法函数,计算所有物性
定义结果输出函数
定义基团贡献法主函数
测试函数
总结

需要加入以下项目:bohrium.dp.tech/projects/share/455831

镜像:自定义镜像->leiz-dlut:chem

基团贡献法(Group Contribution method, GC method)

Marrero, Jorge A. and Rafiqul Gani. “Group-contribution based estimation of pure component properties.” Fluid Phase Equilibria 183 (2001): 183-208.

https://www.sciencedirect.com/science/article/pii/S0378381201004319

代码
文本

1. 基团分割算法

代码
文本

输入分子的 SMILES 字符串,通过预定义的基团 SMARTS 模式,将分子分割为基团集合,统计各基团的名称和个数。

代码
文本

导入RDKit库

代码
文本
[2]
from __future__ import division

__all__ = ['smarts_fragment']

import numpy
from collections import Counter
import os

try:
from rdkit import Chem
hasRDKit = True
except:
# pragma: no cover
hasRDKit = False

rdkit_missing = 'RDKit is not installed; it is required to use this functionality'
代码
文本

定义分子子结构搜索、合并、统计函数。

  • 子结构匹配:使用 RDKit 的 SMARTS 模式匹配,查找分子中所有匹配的基团。
  • 原子覆盖检查:确保所有原子被基团覆盖,否则返回状态码 3 或 4(基团集合不完整)。
  • 子结构合并:通过冒泡排序优先处理体积大的基团(避免重复计数),合并独立基团,确保无遗漏。
  • 输出:返回基团计数、匹配状态(success)和状态码(status)。
代码
文本
[1]
def smarts_fragment(catalog, rdkitmol = None, smi = None):
if not hasRDKit: # pragma: no cover
raise Exception(rdkit_missing)
if rdkitmol is None and smi is None:
raise Exception('Either an rdkit mol or a smiles string is required')
if smi is not None:
rdkitmol = Chem.MolFromSmiles(smi)
if rdkitmol is None:
status = 2 #'Failed to construct mol'
success = False
return {}, success, status

atom_count = len(rdkitmol.GetAtoms())
status = 1 #'OK'
success = True

#子结构搜索结果
counts = {}
all_matches = {}
for key, smart in catalog.items():
patt = Chem.MolFromSmarts(smart)
hits = rdkitmol.GetSubstructMatches(patt)
if hits:
all_matches[smart] = hits
counts[key] = len(hits)
#目标索引
matched_atoms = set()
for i in all_matches.values():
for j in i:
matched_atoms.update(j)
if len(matched_atoms) != atom_count:
status = 3 #'current group set cannot describe this molecule, need more group set defination' #意味着目前的基团都用上,都无法组成完整的分子
success = False
#子结构规整
Substructure_group_type = []
Substructure_group_site = []
for i in range(len(list(all_matches.keys()))):
for j in range(len(list(all_matches.values())[i])):
Substructure_group_type.append(list(counts.keys())[i])
Substructure_group_site.append(list(all_matches.values())[i][j])
Substructure = [[''] * 2 for _ in range(sum(counts.values()))]
for i in range(sum(counts.values())):
Substructure[i][0] = Substructure_group_type[i]
Substructure[i][1] = Substructure_group_site[i]
#子结构排序,体积大的子结构优先出现
def bubble_sort(list_target):
count = len(list_target)
for i in range(count):
for j in range(i + 1, count):
if len(list_target[i][1]) < len(list_target[j][1]):
list_target[i], list_target[j] = list_target[j], list_target[i]
return list_target
Substructure = bubble_sort(Substructure)
#子结构合并
record = [0]
if matched_atoms > set(Substructure[0][1]): #否则第一个子结构就能代表整个分子,它就是单分子基团
compare_set = set(Substructure[0][1])
for i in range(sum(counts.values())):
if (compare_set & set(Substructure[i][1])) == set():
compare_set = compare_set | set(Substructure[i][1])
record.append(i)
if matched_atoms > compare_set:
status = 4 #'mutually independent group set cannot describe this molecule, need more group set defination' #意味着相互独立的基团无法组成完整的分子
success = False
#子结构统计
Substructure_new = [[''] * 2 for _ in range(len(record))] #先提取出来
for i in range(len(record)):
Substructure_new[i][0] = Substructure[record[i]][0]
Substructure_new[i][1] = Substructure[record[i]][1]
group_num = [] #开始统计
for i in range(len(record)):
group_num.append(Substructure_new[i][0])
counts_new = {}
for i in range(len(record)):
counts_new[Substructure_new[i][0]] = Counter(group_num)[Substructure_new[i][0]]
return counts_new, success, status
代码
文本

定义分子基团分割函数,输出该分子基团分割结果

读取预定义的基团 SMARTS 数据(存储在Group_SMARTS.npy),对输入的多个 SMILES 分子进行批量分割。

输出基团分割结果(包括成功状态和状态码),用于后续物性计算。

代码
文本
[14]
def SMILES2Group(molecules):
#读取基团SMARTS数据
Group_SMARTS = numpy.load(os.path.join('/share/GC', 'Group_SMARTS.npy'))
Group_SMARTS = Group_SMARTS.tolist()

#输出基团分割结果(完整版)
Group_SMARTS_id_dict = {i + 1: j[2] for i, j in enumerate(Group_SMARTS)}
x = [''] * len(molecules)
Group = numpy.zeros((len(molecules),len(Group_SMARTS_id_dict) + 2))
for index, molname in enumerate(molecules):
x[index] = smarts_fragment(catalog = Group_SMARTS_id_dict, smi = molname.strip())
Group[index,len(Group_SMARTS_id_dict)] = x[index][1]
Group[index,len(Group_SMARTS_id_dict) + 1] = x[index][2]
for key, value in x[index][0].items():
Group[index,key - 1] = value

Group_set = ''
for i in range(len(Group_SMARTS_id_dict)):
Group_set = Group_set + Group_SMARTS[i][1] + '\t'
Group_set = Group_set + 'success\t' + 'status'

Group_set_split = Group_set.strip().split('\t')
string_group = ''
for i in range(numpy.size(Group,0)):
string_group = string_group + str(i + 1) + '\t' + molecules[i].strip() + '\n'
for j in range(numpy.size(Group,1) - 2): #输出基团数值
if Group[i,j] != 0:
string_group = string_group + Group_set_split[j] + '\t' + str(int(Group[i,j])) + '\n'
string_group = string_group + Group_set_split[-2] + '\t' + str(int(Group[i,-2])) + '\n' + Group_set_split[-1] + '\t' + str(int(Group[i,-1]))
string_group = string_group + '\n--------------------------------------------------------------------------------\n'
return Group, string_group
代码
文本

测试函数

代码
文本
[16]
if __name__ == "__main__":
Group_SMARTS = numpy.load(os.path.join('/share/GC', 'Group_SMARTS.npy'))
Group_SMARTS = Group_SMARTS.tolist()

#读取输入SMILES
with open('/share/GC/input.txt',mode = 'r') as fs:
molecules = fs.readlines()

print(molecules)

#输出基团分割结果(完整版)
Group_SMARTS_id_dict = {i + 1: j[2] for i, j in enumerate(Group_SMARTS)}
x = [''] * len(molecules)
Group = numpy.zeros((len(molecules),len(Group_SMARTS_id_dict) + 2))
for index, molname in enumerate(molecules):
x[index] = smarts_fragment(catalog = Group_SMARTS_id_dict, smi = molname.strip())
Group[index,len(Group_SMARTS_id_dict)] = x[index][1]
Group[index,len(Group_SMARTS_id_dict) + 1] = x[index][2]
for key, value in x[index][0].items():
Group[index,key - 1] = value
Group_set = ''
for i in range(len(Group_SMARTS_id_dict)):
Group_set = Group_set + Group_SMARTS[i][1] + '\t'
Group_set = Group_set + 'success\t' + 'status'
numpy.savetxt('/share/GC/Group_output.txt',Group,fmt = '%d',delimiter = '\t',header = Group_set)

print(Group_set)
print(Group)
['CCCC\n', 'CCCCO\n', 'CCOCC\n', 'Cc1ccccc1']
CH3	CH2	CH	C	CH2=CH	CH=CH	CH2=C	CH=C	C=C	CH2=C=CH	CH2=C=C	C=C=C	CH#C	C#C	aCH	aC	aC(2)	aC(3)	aN	aC-CH3	aC-CH2	aC-CH	aC-C	aC-CH=CH2	aC-CH=CH	aC-C=CH2	aC-C#CH	aC-C#C	OH	aC-OH	COOH	aC-COOH	CH3CO	CH2CO	CHCO	CCO	aC-CO	CHO	aC-CHO	CH3COO	CH2COO	CHCOO	CCOO	HCOO	aC-COO	aC-OOCH	aC-OOC	COO	CH3O	CH2O	CH-O	C-O	aC-O	CH2NH2	CHNH2	CNH2	CH3NH	CH2NH	CHNH	CH3N	CH2N	aC-NH2	aC-NH	aC-N	NH2	CH=N	C=N	CH2CN	CHCN	CCN	aC-CN	CN	CH2NCO	CHNCO	CNCO	aC-NCO	CH2NO2	CHNO2	CNO2	aC-NO2	NO2	ONO	ONO2	HCON(CH2)2	HCONHCH2	CONH2	CONHCH3	CONHCH2	CON(CH3)2	CONCH3CH2	CON(CH2)2	CONHCO	CONCO	aC-CONH2	aC-NH(CO)H	aC-N(CO)H	aC-CONH	aC-NHCO	aC-(N)CO	NHCONH	NH2CONH	NH2CON	NHCON	NCON	aC-NHCONH2	aC-NHCONH	NHCO	CH2Cl	CHCl	CCl	CHCl2	CCl2	CCl3	CH2F	CHF	CF	CHF2	CF2	CF3	CCl2F	HCClF	CClF2	aC-Cl	aC-F	aC-I	aC-Br	I	Br	F	Cl	CHNOH	CNOH	aC-CHNOH	OCH2CH2OH	OCHCH2OH	OCH2CHOH	O-OH	CH2SH	CHSH	CSH	aC-SH	SH	CH3S	CH2S	CHS	CS	aC-S-	SO	SO2	SO3	SO3(2)	SO4	aC-SO	aC-SO2	PH	P	PO3	PHO3	PO3(2)	PHO4	PO4	aC-PO4	aC-P	CO3	C2H3O	C2H2O	C2HO	CH2(cyc)	CH(cyc)	C(cyc)	CH=CH(cyc)	CH=C(cyc)	C=C(cyc)	CH2=C(cyc)	NH(cyc)	N(cyc)	CH=N(cyc)	C=N(cyc)	O(cyc)	CO(cyc)	S(cyc)	SO2(cyc)	>NH	-O-	-S-	>CO	PO2	CH-N	SiHO	SiO	SiH2	SiH	Si	(CH3)3N	N=N	Ccyc=N-	Ccyc=CH-	Ccyc=NH	N=O	Ccyc=C	P=O	N=N(2)	C=NH	>C=S	aC-CON	aC=O	aN-	Na	K	HCONH	CHOCH	C2O	SiH3	SiH2O	CH=C=CH	CH=C=C	OP(=S)O	R	CF2cyc	CFcyc	H2O	success	status
[[2. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]
 [1. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]
 [2. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 1. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]]
代码
文本

2. 基团贡献法计算分子性质

基于基团分割结果,利用预定义的基团贡献值(存储在 SQLite 数据库GC_MG1_DB.db),通过数学公式计算 20 + 种分子性质(如分子量、熔点、沸点、临界参数等)。

代码
文本
[8]
import sqlite3
from math import log, exp, sqrt
from time import time
import os
代码
文本

定义读取基团贡献值数据函数

连接 SQLite 数据库,读取基团的贡献值(如分子量贡献、熔点贡献等)。

代码
文本
[9]
def read_database():
# connect to database
conn = sqlite3.connect(os.path.join('/share/GC', 'GC_MG1_DB.db'))
cur = conn.cursor()
# GC
cur.execute("""
select * from GC;
""")
GC = cur.fetchall()

conn.commit()
conn.close()

return GC
代码
文本

定义性质预测值范围

如果超出该范围,则记录为NaN

代码
文本
[10]
def format1(num, fmt):
if num < -1e6 or num > 1e10:
return 'NaN'
else:
return format(num, fmt)
代码
文本

相对分子量

其中为基团的个数,为基团的贡献值。

代码
文本
[11]
def GC_Mw(len_g, len_m, Groups_result, GC):
Mw = [0 for i in range(len_m)]

for i in range(len_m):
for j in range(len_g):
Mw[i] += Groups_result[0][i][j] * float(GC[j][2])

return Mw
代码
文本

熔点

其中为基团的个数,为基团的贡献值。

代码
文本
[12]
def GC_Tm(len_g, len_m, Groups_result, GC):
Tm = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][3])
Tm[i] = 143.5726 * log(temp)
return Tm
代码
文本

沸点

其中为基团的个数,为基团的贡献值。

代码
文本
[13]
def GC_Tb(len_g, len_m, Groups_result, GC):
Tb = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][4])
Tb[i] = 244.7889 * log(temp)
return Tb
代码
文本

临界温度

其中为基团的个数,为基团的贡献值。

代码
文本
[17]
def GC_Tc(len_g, len_m, Groups_result, GC):
Tc = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][5])
Tc[i] = 181.1926 * log(temp)
return Tc
代码
文本

临界压力

其中为基团的个数,为基团的贡献值。

代码
文本
[18]
def GC_Pc(len_g, len_m, Groups_result, GC):
Pc = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][6])
Pc[i] = pow(1 / (temp + 0.1346), 2) + 0.0519
return Pc
代码
文本

临界体积

其中为基团的个数,为基团的贡献值。

代码
文本
[19]
def GC_Vc(len_g, len_m, Groups_result, GC):
Vc = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][7])
Vc[i] = temp + 28.0018
return Vc
代码
文本

标准摩尔生成吉布斯函数

其中为基团的个数,为基团的贡献值。

代码
文本
[20]
def GC_Gf(len_g, len_m, Groups_result, GC):
Gf = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][8])
Gf[i] = temp - 1.3385
return Gf
代码
文本

标准摩尔生成焓

其中为基团的个数,为基团的贡献值。

代码
文本
[21]
def GC_Hf(len_g, len_m, Groups_result, GC):
Hf = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][9])
Hf[i] = temp + 35.1774
return Hf
代码
文本

汽化焓(298K)

其中为基团的个数,为基团的贡献值。

代码
文本
[22]
def GC_Hv(len_g, len_m, Groups_result, GC):
Hv = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][10])
Hv[i] = temp + 9.6127
return Hv
代码
文本

熔化焓

其中为基团的个数,为基团的贡献值。

代码
文本
[23]
def GC_Hfus(len_g, len_m, Groups_result, GC):
Hfus = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][11])
Hfus[i] = temp + 4.50666
return Hfus
代码
文本

汽化焓(Tb)

其中为基团的个数,为基团的贡献值。

代码
文本
[24]
def GC_Hvb(len_g, len_m, Groups_result, GC):
Hvb = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][15])
Hvb[i] = temp + 15.0884
return Hvb
代码
文本

Hildebrand溶解度

其中为基团的个数,为基团的贡献值。

代码
文本
[25]
def GC_Solp(len_g, len_m, Groups_result, GC):
Solp = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][24])
Solp[i] = temp + 20.7339
return Solp
代码
文本

闪点

其中为基团的个数,为基团的贡献值。

代码
文本
[26]
def GC_Fp(len_g, len_m, Groups_result, GC):
Fp = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][26])
Fp[i] = temp + 170.7058
return Fp
代码
文本

表面张力(298K)

其中为基团的个数,为基团的贡献值。

代码
文本
[27]
def GC_St(len_g, len_m, Groups_result, GC):
St = [0 for i in range(len_m)]

for i in range(len_m):
for j in range(len_g):
St[i] += Groups_result[0][i][j] * float(GC[j][27])
return St
代码
文本

Hansen色散溶解度参数

其中为基团的个数,为基团的贡献值。

代码
文本
[29]
def GC_hspD(len_g, len_m, Groups_result, GC):
hspD = [0 for i in range(len_m)]

for i in range(len_m):
for j in range(len_g):
hspD[i] += Groups_result[0][i][j] * float(GC[j][32])
return hspD
代码
文本

Hansen极化溶解度参数

其中为基团的个数,为基团的贡献值。

代码
文本
[30]
def GC_hspP(len_g, len_m, Groups_result, GC):
hspP = [0 for i in range(len_m)]

for i in range(len_m):
for j in range(len_g):
hspP[i] += Groups_result[0][i][j] * float(GC[j][33])
return hspP
代码
文本

Hansen氢键溶解度参数

其中为基团的个数,为基团的贡献值。

代码
文本
[31]
def GC_hspH(len_g, len_m, Groups_result, GC):
hspH = [0 for i in range(len_m)]

for i in range(len_m):
for j in range(len_g):
hspH[i] += Groups_result[0][i][j] * float(GC[j][34])
return hspH
代码
文本

黏度

其中为基团的个数,为基团的贡献值。

代码
文本
[32]
def GC_visc(len_g, len_m, Groups_result, GC):
visc = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][35])
visc[i] = exp(temp)
return visc
代码
文本

偏心因子

其中为基团的个数,为基团的贡献值。

代码
文本
[33]
def GC_Acentric(len_g, len_m, Groups_result, GC):
Acentric = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][36])
Acentric[i] = 0.9132 * pow(log(temp + 1.0039), 0.0447)
return Acentric
代码
文本

液相摩尔体积(298K)

其中为基团的个数,为基团的贡献值。

代码
文本
[34]
def GC_Vm_298(len_g, len_m, Groups_result, GC):
Vm_298 = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][37])
Vm_298[i] = 1000 * (temp + 0.0123)
return Vm_298
代码
文本

(Fathead Minnow 96-hr)

其中为基团的个数,为基团的贡献值。

代码
文本
[35]
def GC_nlogLC50FM(len_g, len_m, Groups_result, GC):
nlogLC50FM = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][38])
nlogLC50FM[i] = temp + 2.18
return nlogLC50FM
代码
文本

(Daphnia Magna 48-hr)

其中为基团的个数,为基团的贡献值。

代码
文本
[36]
def GC_nlogLC50DM(len_g, len_m, Groups_result, GC):
nlogLC50DM = [0 for i in range(len_m)]

for i in range(len_m):
temp = 0
for j in range(len_g):
temp += Groups_result[0][i][j] * float(GC[j][39])
nlogLC50DM[i] = temp + 3.59
return nlogLC50DM
代码
文本

扩散系数

其中为临界体积。

代码
文本
[37]
def GC_Dw(len_m, Temperature, Vc):
Dw = [0 for i in range(len_m)]

x_Dw = exp(-24.71 + 4209 / Temperature + 0.04527 * Temperature - 0.00003376 * Temperature * Temperature)
for i in range(len_m):
try:
vb = 0.285 * exp(1.048 * log(Vc[i]))
Dw[i] = exp(log(.01955) - 0.433 * log(vb)) * Temperature / x_Dw
except:
Dw[i] = -1e10

return Dw
代码
文本

热导率

其中为临界温度,为沸点,为相对分子质量。

代码
文本
[38]
def GC_lambda(len_m, Temperature, Tc, Tb, Mw):
lambda1 = [0 for i in range(len_m)]

for i in range(len_m):
try:
Tr = Temperature / Tc[i]
Tbr = Tb[i] / Tc[i]
temp_Tr = exp(0.6666 * log(abs(1-Tr)))
temp_Tbr = exp(0.6666 * log(abs(1 - Tbr)))
lambda1[i] = 1.11 * (3 + 20 * temp_Tr) / (3 + 20 * temp_Tbr) / sqrt(Mw[i])
except:
lambda1[i] = -1e10

return lambda1
代码
文本

饱和蒸气压

代码
文本
[39]
def GC_Psat(len_m, Temperature, Pc, Tc, Tb):
Psat = [0 for i in range(len_m)]
Omega = [0 for i in range(len_m)]

for i in range(len_m):
try:
Tr = Temperature / Tc[i]
Tbr = Tb[i] / Tc[i]
Pr = 1 / Pc[i]
F0 = 5.92714 - 6.09648 / Tbr - 1.28862 * log(Tbr) + 0.169347 * pow(Tbr, 6)
F1 = 15.2518 - 15.6875 / Tbr - 13.4721 * log(Tbr) + 0.43577 * pow(Tbr, 6)
Omega[i] = (log(Pr) - F0) / F1
F01 = 5.92714 - 6.09648 / Tr - 1.28862 * log(Tr) + 0.169347 * pow(Tr, 6)
F11 = 15.2518 - 15.6875 / Tr - 13.4721 * log(Tr) + 0.43577 * pow(Tr, 6)
Psat[i] = 1e5 * (Pc[i] * exp(F01 + Omega[i] * F11))
except:
Psat[i] = -1e10

return Psat, Omega
代码
文本

密度

代码
文本
[40]
def GC_rho(len_m, Temperature, Omega, Tc, Pc, Mw):
rho = [0 for i in range(len_m)]

for i in range(len_m):
try:
Zra = 0.29056 - 0.08775 * Omega[i]
Tr = Temperature / Tc[i]
temp = exp(0.285714 * log(1 - Tr)) + 1
temp1 = exp(temp * log(Zra))
rho[i] = Mw[i] * Pc[i] / 83.14 / Tc[i] / temp1
except:
rho[i] = -1e10

return rho
代码
文本

溶解度系数

代码
文本
[41]
def GC_SolPar(len_m, Temperature, Omega, Pc, Tc, Hv):
SolPar = [0 for i in range(len_m)]

for i in range(len_m):
try:
Zra = 0.29056 - 0.08775 * Omega[i]
Tr = Temperature / Tc[i]
temp = exp(0.285714 * log(1 - Tr)) + 1
temp1 = exp(temp * log(Zra))
Vm = 83.14 * Tc[i] * temp1 / Pc[i]
SolPar[i] = sqrt((1000 * Hv[i] - 8.314 * Temperature) / Vm)
except:
SolPar[i] = -1e10

return SolPar
代码
文本

定义基团贡献法函数,计算所有物性

整合所有物性计算函数,批量计算输入分子的所有目标性质。

代码
文本
[42]
def GroupContribution(len_g, len_m, Groups_result, GC, Temperature):
# Mw
Mw = GC_Mw(len_g, len_m, Groups_result, GC)
# Tm
Tm = GC_Tm(len_g, len_m, Groups_result, GC)
# Tb
Tb = GC_Tb(len_g, len_m, Groups_result, GC)
# Tc
Tc = GC_Tc(len_g, len_m, Groups_result, GC)
# Pc
Pc = GC_Pc(len_g, len_m, Groups_result, GC)
# Vc
Vc = GC_Vc(len_g, len_m, Groups_result, GC)
# Gf
Gf = GC_Gf(len_g, len_m, Groups_result, GC)
# Hf
Hf = GC_Hf(len_g, len_m, Groups_result, GC)
# Hv
Hv = GC_Hv(len_g, len_m, Groups_result, GC)
# Hfus
Hfus = GC_Hfus(len_g, len_m, Groups_result, GC)
# Hvb
Hvb = GC_Hvb(len_g, len_m, Groups_result, GC)
# Solp
Solp = GC_Solp(len_g, len_m, Groups_result, GC)
# Fp
Fp = GC_Fp(len_g, len_m, Groups_result, GC)
# St
St = GC_St(len_g, len_m, Groups_result, GC)
# hspD
hspD = GC_hspD(len_g, len_m, Groups_result, GC)
# hspP
hspP = GC_hspP(len_g, len_m, Groups_result, GC)
# hspH
hspH = GC_hspH(len_g, len_m, Groups_result, GC)
# visc
visc = GC_visc(len_g, len_m, Groups_result, GC)
# Acentric
Acentric = GC_Acentric(len_g, len_m, Groups_result, GC)
# Vm_298
Vm_298 = GC_Vm_298(len_g, len_m, Groups_result, GC)
# nlogLC50FM
nlogLC50FM = GC_nlogLC50FM(len_g, len_m, Groups_result, GC)
# nlogLC50DM
nlogLC50DM = GC_nlogLC50DM(len_g, len_m, Groups_result, GC)
# Dw
Dw = GC_Dw(len_m, Temperature, Vc)
# lambda
lambda1 = GC_lambda(len_m, Temperature, Tc, Tb, Mw)
# Psat, Omega
Psat, Omega = GC_Psat(len_m, Temperature, Pc, Tc, Tb)
# rho
rho = GC_rho(len_m, Temperature, Omega, Tc, Pc, Mw)
# SolPar
SolPar = GC_SolPar(len_m, Temperature, Omega, Pc, Tc, Hv)

return Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar

代码
文本

定义结果输出函数

格式化输出计算结果,处理数值范围(如超出合理范围记为NaN),提供清晰的可读性。

代码
文本
[43]
def print_results(molecules, len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar):
#print('\n********************************************************************************\n')
#print('\nGroup Segments:\n--------------------------------------------------------------------------------')
#print(Groups_result[1])
# display results
property_prediction = '\n********************************************************************************\n'
property_prediction += '\nProperty prediction (Group contribution):\n--------------------------------------------------------------------------------\n'
for i in range(len_m):
property_prediction += str(i + 1) + '\t' + molecules[i] + '\n'
if Groups_result[0][i][-2] != 1:
property_prediction += 'Warning: Group segment fails for this molecule, property prediction fails!\n'
property_prediction += 'Mw: Molecular weight (g/mol)\t\t\t' + format1(Mw[i], '.4f') + '\n'
property_prediction += 'Tm: Normal melting point (K)\t\t\t' + format1(Tm[i], '.4f') + '\n'
property_prediction += 'Tb: Normal boiling point (K)\t\t\t' + format1(Tb[i], '.4f') + '\n'
property_prediction += 'Tc: Critical temperature (K)\t\t\t' + format1(Tc[i], '.4f') + '\n'
property_prediction += 'Pc: Critical pressure (bar)\t\t\t' + format1(Pc[i], '.4f') + '\n'
property_prediction += 'Vc: Critical volume (cm^3/mol)\t\t\t' + format1(Vc[i], '.4f') + '\n'
property_prediction += 'Gf: Standard Gibbs free energy of formation\t' + format1(Gf[i], '.4f') + '\n'
property_prediction += ' at 298K (kJ/mol)\n'
property_prediction += 'Hf: Standard enthalpy of formation\t\t' + format1(Hf[i], '.4f') + '\n'
property_prediction += ' at 298K (kJ/mol)\n'
property_prediction += 'Hv: Enthalpy of vaporization at 298K (kJ/mol)\t' + format1(Hv[i], '.4f') + '\n'
property_prediction += 'Hfus: Enthalpy of fusion (kJ/mol)\t\t' + format1(Hfus[i], '.4f') + '\n'
property_prediction += 'Hvb: Enthalpy of vaporization at Tb (kJ/mol)\t' + format1(Hvb[i], '.4f') + '\n'
property_prediction += 'Solp: Hildebrand solubility parameter\t\t' + format1(Solp[i], '.4f') + '\n'
property_prediction += ' at 298K (MPa^0.5)\n'
property_prediction += 'Fp: Flash point (K)\t\t\t\t' + format1(Fp[i], '.4f') + '\n'
property_prediction += 'St: Surface tension at 298K (dym/cm)\t\t' + format1(St[i], '.4f') + '\n'
property_prediction += 'hspD: Hansen dispersive solubility parameter\t' + format1(hspD[i], '.4f') + '\n'
property_prediction += ' (MPa^0.5)\n'
property_prediction += 'hspP: Hansen polar solubility parameter\t\t' + format1(hspP[i], '.4f') + '\n'
property_prediction += ' (MPa^0.5)\n'
property_prediction += 'hspH: Hansen Hydrogen-bond solubility parameter\t' + format1(hspH[i], '.4f') + '\n'
property_prediction += ' (MPa^0.5)\n'
property_prediction += 'visc: Viscosity (cP)\t\t\t\t' + format1(visc[i], '.4f') + '\n'
property_prediction += 'Acentric: Pitzer\'s Acentric Factor\t\t' + format1(Acentric[i], '.4f') + '\n'
property_prediction += 'Vm_298: Liquid molar volume at 298K (cm^3/mol)\t' + format1(Vm_298[i], '.4f') + '\n'
property_prediction += 'nlogLC50FM: Fathead Minnow 96-hr LC50\t\t' + format1(nlogLC50FM[i], '.4f') + '\n'
property_prediction += ' (-log(mol/L))\n'
property_prediction += 'nlogLC50DM: Daphnia Magna 48-hr LC50\t\t' + format1(nlogLC50DM[i], '.4f') + '\n'
property_prediction += ' (-log(mol/L))\n'
property_prediction += 'Dw: Diffusion coefficient (cm/s)\t\t' + format1(Dw[i], '.4f') + '\n'
property_prediction += 'lambda: Thermal conductivity (W/m/K)\t\t' + format1(lambda1[i], '.4f') + '\n'
property_prediction += 'Psat: Vapor pressure (Pa)\t\t\t' + format1(Psat[i], '.4f') + '\n'
property_prediction += 'Omega: Compressibility factor\t\t\t' + format1(Omega[i], '.4f') + '\n'
property_prediction += 'rho: Density (g/cm^3)\t\t\t\t' + format1(rho[i], '.4f') + '\n'
property_prediction += 'SolPar: Solubility parameter (MPa^0.5)\t\t' + format1(SolPar[i], '.4f') + '\n'
property_prediction += '--------------------------------------------------------------------------------\n'

print(property_prediction)

return property_prediction
代码
文本

定义基团贡献法主函数

文件结构

  • 输入:SMILES 列表(input.txt)、基团 SMARTS 数据(Group_SMARTS.npy)、数据库(GC_MG1_DB.db)。
  • 输出:基团分割结果(Group_output.txt)、物性预测报告(控制台输出)。

主函数流程

  • 输入 SMILES 和温度,调用PyGC主函数。
  • 执行基团分割,获取基团计数和状态。
  • 读取数据库,计算各物性。
  • 格式化输出结果,包括时间性能统计。
代码
文本
[48]
def PyGC(molecules, Temperature):
print('\n********************************************************************************\n')
print('> PyGC: Group Contribution based Property Prediction')
print('> Lei Zhang (keleiz@dlut.edu.cn); Qilei Liu (liuqilei@dlut.edu.cn)')
print('> Mar. 22, 2025')
print('> Institute of chemical process systems engineering')
print('> School of chemical engineering')
print('> Dalian University of Technology, Dalian 116024, China')

# group segment
Groups_result = SMILES2Group(molecules)
# read group contribution database
GC = read_database()

len_g = len(GC)
len_m = len(molecules)

# property prediction
Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar = GroupContribution(len_g, len_m, Groups_result, GC, Temperature)
return len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar

代码
文本

测试函数

代码
文本
[49]
if __name__ == "__main__":
start_time = time()

# input data
molecules = ['CC(=O)OC1=CC=CC=C1C(O)=O', 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O', 'CC(C)(C)NCC(O)C1=CC(CO)=C(O)C=C1', 'Cc1c([N+](=O)[O-])cc([N+](=O)[O-])cc1[N+](=O)[O-]', 'C(=C(C(=Cc1ccccc1)c1ccccc1)c1ccccc1)c1ccccc1']
Temperature = 298.15
#try:
# property prediction
len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar = PyGC(molecules, Temperature)
# print results
print_results(molecules, len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar)
#except:
# print('\n> There is something wrong with the input SMILES!')
end_time = time()
print('\nWALL TIME:\t', format(end_time - start_time, '.4f'), ' (s)\n')
********************************************************************************

> PyGC: Group Contribution based Property Prediction
> Lei Zhang (keleiz@dlut.edu.cn); Qilei Liu (liuqilei@dlut.edu.cn)
> Mar. 22, 2025
> Institute of chemical process systems engineering
> School of chemical engineering
> Dalian University of Technology, Dalian 116024, China

********************************************************************************

Property prediction (Group contribution):
--------------------------------------------------------------------------------
1	CC(=O)OC1=CC=CC=C1C(O)=O
Mw: Molecular weight (g/mol)			180.1597
Tm: Normal melting point (K)			436.6321
Tb: Normal boiling point (K)			592.0224
Tc: Critical temperature (K)			891.4422
Pc: Critical pressure (bar)			34.3293
Vc: Critical volume (cm^3/mol)			471.2066
Gf: Standard Gibbs free energy of formation	-532.5249
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-680.0922
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	NaN
Hfus: Enthalpy of fusion (kJ/mol)		40.4855
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		20.9713
      at 298K (MPa^0.5)
Fp: Flash point (K)				483.2930
St: Surface tension at 298K (dym/cm)		NaN
hspD: Hansen dispersive solubility parameter	21.0164
      (MPa^0.5)
hspP: Hansen polar solubility parameter		6.0868
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	9.8884
      (MPa^0.5)
visc: Viscosity (cP)				0.4874
Acentric: Pitzer's Acentric Factor		0.8252
Vm_298: Liquid molar volume at 298K (cm^3/mol)	155.6850
nlogLC50FM: Fathead Minnow 96-hr LC50		2.7027
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		-0.8246
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.6770
lambda: Thermal conductivity (W/m/K)		0.1191
Psat: Vapor pressure (Pa)			0.4547
Omega: Compressibility factor			0.2928
rho: Density (g/cm^3)				1.0280
SolPar: Solubility parameter (MPa^0.5)		NaN
--------------------------------------------------------------------------------
2	CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
Mw: Molecular weight (g/mol)			206.2852
Tm: Normal melting point (K)			343.3547
Tb: Normal boiling point (K)			582.2050
Tc: Critical temperature (K)			786.3869
Pc: Critical pressure (bar)			22.6432
Vc: Critical volume (cm^3/mol)			663.4145
Gf: Standard Gibbs free energy of formation	-187.4798
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-448.4679
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	77.6696
Hfus: Enthalpy of fusion (kJ/mol)		28.0466
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		19.2869
      at 298K (MPa^0.5)
Fp: Flash point (K)				446.5630
St: Surface tension at 298K (dym/cm)		28.7115
hspD: Hansen dispersive solubility parameter	17.6759
      (MPa^0.5)
hspP: Hansen polar solubility parameter		2.0559
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	4.3485
      (MPa^0.5)
visc: Viscosity (cP)				13.4884
Acentric: Pitzer's Acentric Factor		0.8248
Vm_298: Liquid molar volume at 298K (cm^3/mol)	202.2285
nlogLC50FM: Fathead Minnow 96-hr LC50		3.9587
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		4.2037
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.5796
lambda: Thermal conductivity (W/m/K)		0.1218
Psat: Vapor pressure (Pa)			0.0451
Omega: Compressibility factor			0.6762
rho: Density (g/cm^3)				1.1090
SolPar: Solubility parameter (MPa^0.5)		20.1055
--------------------------------------------------------------------------------
3	CC(C)(C)NCC(O)C1=CC(CO)=C(O)C=C1
Mw: Molecular weight (g/mol)			239.3089
Tm: Normal melting point (K)			408.2666
Tb: Normal boiling point (K)			642.1434
Tc: Critical temperature (K)			835.7446
Pc: Critical pressure (bar)			28.0000
Vc: Critical volume (cm^3/mol)			713.2700
Gf: Standard Gibbs free energy of formation	-201.2924
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-551.1721
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	151.4397
Hfus: Enthalpy of fusion (kJ/mol)		38.3394
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		25.0683
      at 298K (MPa^0.5)
Fp: Flash point (K)				578.0831
St: Surface tension at 298K (dym/cm)		38.4238
hspD: Hansen dispersive solubility parameter	17.8190
      (MPa^0.5)
hspP: Hansen polar solubility parameter		7.7930
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	24.9708
      (MPa^0.5)
visc: Viscosity (cP)				5568.3835
Acentric: Pitzer's Acentric Factor		0.8459
Vm_298: Liquid molar volume at 298K (cm^3/mol)	206.8158
nlogLC50FM: Fathead Minnow 96-hr LC50		3.1616
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		-0.5715
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.5609
lambda: Thermal conductivity (W/m/K)		0.1218
Psat: Vapor pressure (Pa)			0.0000
Omega: Compressibility factor			1.1138
rho: Density (g/cm^3)				2.1342
SolPar: Solubility parameter (MPa^0.5)		36.4482
--------------------------------------------------------------------------------
4	Cc1c([N+](=O)[O-])cc([N+](=O)[O-])cc1[N+](=O)[O-]
Mw: Molecular weight (g/mol)			227.1332
Tm: Normal melting point (K)			396.5192
Tb: Normal boiling point (K)			633.1089
Tc: Critical temperature (K)			862.9637
Pc: Critical pressure (bar)			31.2272
Vc: Critical volume (cm^3/mol)			578.2647
Gf: Standard Gibbs free energy of formation	196.8224
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-40.1631
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	99.9948
Hfus: Enthalpy of fusion (kJ/mol)		32.1185
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		22.3972
      at 298K (MPa^0.5)
Fp: Flash point (K)				554.0767
St: Surface tension at 298K (dym/cm)		64.1788
hspD: Hansen dispersive solubility parameter	20.3463
      (MPa^0.5)
hspP: Hansen polar solubility parameter		11.4278
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	5.3196
      (MPa^0.5)
visc: Viscosity (cP)				35.2759
Acentric: Pitzer's Acentric Factor		0.8281
Vm_298: Liquid molar volume at 298K (cm^3/mol)	148.2411
nlogLC50FM: Fathead Minnow 96-hr LC50		4.8665
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		4.1870
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.6169
lambda: Thermal conductivity (W/m/K)		0.1180
Psat: Vapor pressure (Pa)			0.0004
Omega: Compressibility factor			0.7850
rho: Density (g/cm^3)				1.6942
SolPar: Solubility parameter (MPa^0.5)		26.9697
--------------------------------------------------------------------------------
5	C(=C(C(=Cc1ccccc1)c1ccccc1)c1ccccc1)c1ccccc1
Mw: Molecular weight (g/mol)			358.4774
Tm: Normal melting point (K)			433.2688
Tb: Normal boiling point (K)			741.3036
Tc: Critical temperature (K)			1061.0020
Pc: Critical pressure (bar)			18.8377
Vc: Critical volume (cm^3/mol)			1195.7405
Gf: Standard Gibbs free energy of formation	742.1677
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		487.1753
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	135.8193
Hfus: Enthalpy of fusion (kJ/mol)		47.9394
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	95.9622
Solp: Hildebrand solubility parameter		19.4211
      at 298K (MPa^0.5)
Fp: Flash point (K)				601.7472
St: Surface tension at 298K (dym/cm)		NaN
hspD: Hansen dispersive solubility parameter	38.2952
      (MPa^0.5)
hspP: Hansen polar solubility parameter		7.9674
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	4.1192
      (MPa^0.5)
visc: Viscosity (cP)				4.0718
Acentric: Pitzer's Acentric Factor		0.8221
Vm_298: Liquid molar volume at 298K (cm^3/mol)	314.8005
nlogLC50FM: Fathead Minnow 96-hr LC50		5.4598
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		-0.8082
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.4437
lambda: Thermal conductivity (W/m/K)		0.0932
Psat: Vapor pressure (Pa)			0.0026
Omega: Compressibility factor			0.2668
rho: Density (g/cm^3)				0.9526
SolPar: Solubility parameter (MPa^0.5)		18.8236
--------------------------------------------------------------------------------


WALL TIME:	 0.1019  (s)

代码
文本

注意事项

  • 基团覆盖性:若分子中存在未定义的基团(状态码 3/4),预测失效,需扩展基团 SMARTS 库。
  • 公式局限性:部分物性公式基于特定数据集拟合,外推时可能误差较大。
代码
文本

总结

这段代码实现了从分子结构到多性质预测的完整流程,结合 RDKit 的分子处理能力和基团贡献法的理论框架,为化学、化工、材料等领域提供了高效的性质预测工具。核心逻辑包括 SMARTS 模式匹配、基团统计、公式计算和结果可视化,适用于批量分子性质分析。

代码
文本
化学信息学与智能产品工程
化学信息学与智能产品工程
已赞1