探究

实验室

计算

公开

基团贡献法

化学信息学与智能产品工程

张磊

liuqilei@dlut.edu.cn

更新于 2025-04-15

推荐镜像 :leiz-dlut:chem

推荐机型 :c2_m4_cpu

1. 基团分割算法

导入RDKit库

定义分子子结构搜索、合并、统计函数。

定义分子基团分割函数，输出该分子基团分割结果

测试函数

2. 基团贡献法计算分子性质

定义读取基团贡献值数据函数

定义性质预测值范围

相对分子量

熔点

沸点

临界温度

临界压力

临界体积

标准摩尔生成吉布斯函数

标准摩尔生成焓

汽化焓（298K）

熔化焓

汽化焓（Tb）

Hildebrand溶解度

闪点

表面张力（298K）

Hansen色散溶解度参数

Hansen极化溶解度参数

Hansen氢键溶解度参数

黏度

偏心因子

液相摩尔体积（298K）

LC50 (Fathead Minnow 96-hr)

LC50 (Daphnia Magna 48-hr)

扩散系数

热导率

饱和蒸气压

密度

溶解度系数

定义基团贡献法函数，计算所有物性

定义结果输出函数

定义基团贡献法主函数

测试函数

总结

需要加入以下项目：bohrium.dp.tech/projects/share/455831

镜像：自定义镜像->leiz-dlut:chem

基团贡献法（Group Contribution method, GC method）

Marrero, Jorge A. and Rafiqul Gani. “Group-contribution based estimation of pure component properties.” Fluid Phase Equilibria 183 (2001): 183-208.

https://www.sciencedirect.com/science/article/pii/S0378381201004319

代码

文本

1. 基团分割算法

代码

文本

输入分子的 SMILES 字符串，通过预定义的基团 SMARTS 模式，将分子分割为基团集合，统计各基团的名称和个数。

代码

文本

导入RDKit库

代码

文本

[2]

from __future__ import division

__all__ = ['smarts_fragment']

import numpy

from collections import Counter

import os

try:

from rdkit import Chem

hasRDKit = True

except:

# pragma: no cover

hasRDKit = False

rdkit_missing = 'RDKit is not installed; it is required to use this functionality'

代码

文本

定义分子子结构搜索、合并、统计函数。

子结构匹配：使用 RDKit 的 SMARTS 模式匹配，查找分子中所有匹配的基团。
原子覆盖检查：确保所有原子被基团覆盖，否则返回状态码 3 或 4（基团集合不完整）。
子结构合并：通过冒泡排序优先处理体积大的基团（避免重复计数），合并独立基团，确保无遗漏。
输出：返回基团计数、匹配状态（success）和状态码（status）。

代码

文本

[1]

def smarts_fragment(catalog, rdkitmol = None, smi = None):

if not hasRDKit: # pragma: no cover

raise Exception(rdkit_missing)

if rdkitmol is None and smi is None:

raise Exception('Either an rdkit mol or a smiles string is required')

if smi is not None:

rdkitmol = Chem.MolFromSmiles(smi)

if rdkitmol is None:

status = 2 #'Failed to construct mol'

success = False

return {}, success, status

atom_count = len(rdkitmol.GetAtoms())

status = 1 #'OK'

success = True

#子结构搜索结果

counts = {}

all_matches = {}

for key, smart in catalog.items():

patt = Chem.MolFromSmarts(smart)

hits = rdkitmol.GetSubstructMatches(patt)

if hits:

all_matches[smart] = hits

counts[key] = len(hits)

#目标索引

matched_atoms = set()

for i in all_matches.values():

for j in i:

matched_atoms.update(j)

if len(matched_atoms) != atom_count:

status = 3 #'current group set cannot describe this molecule, need more group set defination' #意味着目前的基团都用上，都无法组成完整的分子

success = False

#子结构规整

Substructure_group_type = []

Substructure_group_site = []

for i in range(len(list(all_matches.keys()))):

for j in range(len(list(all_matches.values())[i])):

Substructure_group_type.append(list(counts.keys())[i])

Substructure_group_site.append(list(all_matches.values())[i][j])

Substructure = [[''] * 2 for _ in range(sum(counts.values()))]

for i in range(sum(counts.values())):

Substructure[i][0] = Substructure_group_type[i]

Substructure[i][1] = Substructure_group_site[i]

#子结构排序，体积大的子结构优先出现

def bubble_sort(list_target):

count = len(list_target)

for i in range(count):

for j in range(i + 1, count):

if len(list_target[i][1]) < len(list_target[j][1]):

list_target[i], list_target[j] = list_target[j], list_target[i]

return list_target

Substructure = bubble_sort(Substructure)

#子结构合并

record = [0]

if matched_atoms > set(Substructure[0][1]): #否则第一个子结构就能代表整个分子，它就是单分子基团

compare_set = set(Substructure[0][1])

for i in range(sum(counts.values())):

if (compare_set & set(Substructure[i][1])) == set():

compare_set = compare_set | set(Substructure[i][1])

record.append(i)

if matched_atoms > compare_set:

status = 4 #'mutually independent group set cannot describe this molecule, need more group set defination' #意味着相互独立的基团无法组成完整的分子

success = False

#子结构统计

Substructure_new = [[''] * 2 for _ in range(len(record))] #先提取出来

for i in range(len(record)):

Substructure_new[i][0] = Substructure[record[i]][0]

Substructure_new[i][1] = Substructure[record[i]][1]

group_num = [] #开始统计

for i in range(len(record)):

group_num.append(Substructure_new[i][0])

counts_new = {}

for i in range(len(record)):

counts_new[Substructure_new[i][0]] = Counter(group_num)[Substructure_new[i][0]]

return counts_new, success, status

代码

文本

定义分子基团分割函数，输出该分子基团分割结果

读取预定义的基团 SMARTS 数据（存储在Group_SMARTS.npy），对输入的多个 SMILES 分子进行批量分割。

输出基团分割结果（包括成功状态和状态码），用于后续物性计算。

代码

文本

[14]

def SMILES2Group(molecules):

#读取基团SMARTS数据

Group_SMARTS = numpy.load(os.path.join('/share/GC', 'Group_SMARTS.npy'))

Group_SMARTS = Group_SMARTS.tolist()

#输出基团分割结果（完整版）

Group_SMARTS_id_dict = {i + 1: j[2] for i, j in enumerate(Group_SMARTS)}

x = [''] * len(molecules)

Group = numpy.zeros((len(molecules),len(Group_SMARTS_id_dict) + 2))

for index, molname in enumerate(molecules):

x[index] = smarts_fragment(catalog = Group_SMARTS_id_dict, smi = molname.strip())

Group[index,len(Group_SMARTS_id_dict)] = x[index][1]

Group[index,len(Group_SMARTS_id_dict) + 1] = x[index][2]

for key, value in x[index][0].items():

Group[index,key - 1] = value

Group_set = ''

for i in range(len(Group_SMARTS_id_dict)):

Group_set = Group_set + Group_SMARTS[i][1] + '\t'

Group_set = Group_set + 'success\t' + 'status'

Group_set_split = Group_set.strip().split('\t')

string_group = ''

for i in range(numpy.size(Group,0)):

string_group = string_group + str(i + 1) + '\t' + molecules[i].strip() + '\n'

for j in range(numpy.size(Group,1) - 2): #输出基团数值

if Group[i,j] != 0:

string_group = string_group + Group_set_split[j] + '\t' + str(int(Group[i,j])) + '\n'

string_group = string_group + Group_set_split[-2] + '\t' + str(int(Group[i,-2])) + '\n' + Group_set_split[-1] + '\t' + str(int(Group[i,-1]))

string_group = string_group + '\n--------------------------------------------------------------------------------\n'

return Group, string_group

代码

文本

测试函数

代码

文本

[16]

if __name__ == "__main__":

Group_SMARTS = numpy.load(os.path.join('/share/GC', 'Group_SMARTS.npy'))

Group_SMARTS = Group_SMARTS.tolist()

#读取输入SMILES

with open('/share/GC/input.txt',mode = 'r') as fs:

molecules = fs.readlines()

print(molecules)

#输出基团分割结果（完整版）

Group_SMARTS_id_dict = {i + 1: j[2] for i, j in enumerate(Group_SMARTS)}

x = [''] * len(molecules)

Group = numpy.zeros((len(molecules),len(Group_SMARTS_id_dict) + 2))

for index, molname in enumerate(molecules):

x[index] = smarts_fragment(catalog = Group_SMARTS_id_dict, smi = molname.strip())

Group[index,len(Group_SMARTS_id_dict)] = x[index][1]

Group[index,len(Group_SMARTS_id_dict) + 1] = x[index][2]

for key, value in x[index][0].items():

Group[index,key - 1] = value

Group_set = ''

for i in range(len(Group_SMARTS_id_dict)):

Group_set = Group_set + Group_SMARTS[i][1] + '\t'

Group_set = Group_set + 'success\t' + 'status'

numpy.savetxt('/share/GC/Group_output.txt',Group,fmt = '%d',delimiter = '\t',header = Group_set)

print(Group_set)

print(Group)

['CCCC\n', 'CCCCO\n', 'CCOCC\n', 'Cc1ccccc1']
CH3	CH2	CH	C	CH2=CH	CH=CH	CH2=C	CH=C	C=C	CH2=C=CH	CH2=C=C	C=C=C	CH#C	C#C	aCH	aC	aC(2)	aC(3)	aN	aC-CH3	aC-CH2	aC-CH	aC-C	aC-CH=CH2	aC-CH=CH	aC-C=CH2	aC-C#CH	aC-C#C	OH	aC-OH	COOH	aC-COOH	CH3CO	CH2CO	CHCO	CCO	aC-CO	CHO	aC-CHO	CH3COO	CH2COO	CHCOO	CCOO	HCOO	aC-COO	aC-OOCH	aC-OOC	COO	CH3O	CH2O	CH-O	C-O	aC-O	CH2NH2	CHNH2	CNH2	CH3NH	CH2NH	CHNH	CH3N	CH2N	aC-NH2	aC-NH	aC-N	NH2	CH=N	C=N	CH2CN	CHCN	CCN	aC-CN	CN	CH2NCO	CHNCO	CNCO	aC-NCO	CH2NO2	CHNO2	CNO2	aC-NO2	NO2	ONO	ONO2	HCON(CH2)2	HCONHCH2	CONH2	CONHCH3	CONHCH2	CON(CH3)2	CONCH3CH2	CON(CH2)2	CONHCO	CONCO	aC-CONH2	aC-NH(CO)H	aC-N(CO)H	aC-CONH	aC-NHCO	aC-(N)CO	NHCONH	NH2CONH	NH2CON	NHCON	NCON	aC-NHCONH2	aC-NHCONH	NHCO	CH2Cl	CHCl	CCl	CHCl2	CCl2	CCl3	CH2F	CHF	CF	CHF2	CF2	CF3	CCl2F	HCClF	CClF2	aC-Cl	aC-F	aC-I	aC-Br	I	Br	F	Cl	CHNOH	CNOH	aC-CHNOH	OCH2CH2OH	OCHCH2OH	OCH2CHOH	O-OH	CH2SH	CHSH	CSH	aC-SH	SH	CH3S	CH2S	CHS	CS	aC-S-	SO	SO2	SO3	SO3(2)	SO4	aC-SO	aC-SO2	PH	P	PO3	PHO3	PO3(2)	PHO4	PO4	aC-PO4	aC-P	CO3	C2H3O	C2H2O	C2HO	CH2(cyc)	CH(cyc)	C(cyc)	CH=CH(cyc)	CH=C(cyc)	C=C(cyc)	CH2=C(cyc)	NH(cyc)	N(cyc)	CH=N(cyc)	C=N(cyc)	O(cyc)	CO(cyc)	S(cyc)	SO2(cyc)	>NH	-O-	-S-	>CO	PO2	CH-N	SiHO	SiO	SiH2	SiH	Si	(CH3)3N	N=N	Ccyc=N-	Ccyc=CH-	Ccyc=NH	N=O	Ccyc=C	P=O	N=N(2)	C=NH	>C=S	aC-CON	aC=O	aN-	Na	K	HCONH	CHOCH	C2O	SiH3	SiH2O	CH=C=CH	CH=C=C	OP(=S)O	R	CF2cyc	CFcyc	H2O	success	status
[[2. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]
 [1. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]
 [2. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 1. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1. 1.]]

代码

文本

2. 基团贡献法计算分子性质

基于基团分割结果，利用预定义的基团贡献值（存储在 SQLite 数据库GC_MG1_DB.db），通过数学公式计算 20 + 种分子性质（如分子量、熔点、沸点、临界参数等）。

代码

文本

[8]

import sqlite3

from math import log, exp, sqrt

from time import time

import os

代码

文本

定义读取基团贡献值数据函数

连接 SQLite 数据库，读取基团的贡献值（如分子量贡献、熔点贡献等）。

代码

文本

[9]

def read_database():

# connect to database

conn = sqlite3.connect(os.path.join('/share/GC', 'GC_MG1_DB.db'))

cur = conn.cursor()

# GC

cur.execute("""

select * from GC;

""")

GC = cur.fetchall()

conn.commit()

conn.close()

return GC

代码

文本

定义性质预测值范围

如果超出该范围，则记录为NaN

代码

文本

[10]

def format1(num, fmt):

if num < -1e6 or num > 1e10:

return 'NaN'

else:

return format(num, fmt)

代码

文本

相对分子量

$Mw = i \sum n_{i} M w_{i}$

其中 $n_{i}$ 为基团 $i$ 的个数， $M w_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[11]

def GC_Mw(len_g, len_m, Groups_result, GC):

Mw = [0 for i in range(len_m)]

for i in range(len_m):

for j in range(len_g):

Mw[i] += Groups_result[0][i][j] * float(GC[j][2])

return Mw

代码

文本

熔点

$T m = 143.5726 ln (i \sum n_{i} T m_{i})$

其中 $n_{i}$ 为基团 $i$ 的个数， $T m_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[12]

def GC_Tm(len_g, len_m, Groups_result, GC):

Tm = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][3])

Tm[i] = 143.5726 * log(temp)

return Tm

代码

文本

沸点

$T b = 244.7889 ln (i \sum n_{i} T b_{i})$

其中 $n_{i}$ 为基团 $i$ 的个数， $T b_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[13]

def GC_Tb(len_g, len_m, Groups_result, GC):

Tb = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][4])

Tb[i] = 244.7889 * log(temp)

return Tb

代码

文本

临界温度

$T c = 181.1926 ln (i \sum n_{i} T c_{i})$

其中 $n_{i}$ 为基团 $i$ 的个数， $T c_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[17]

def GC_Tc(len_g, len_m, Groups_result, GC):

Tc = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][5])

Tc[i] = 181.1926 * log(temp)

return Tc

代码

文本

临界压力

$P c = (0.1346 + i \sum n_{i} P c_{i})^{- 2} + 0.0519$

其中 $n_{i}$ 为基团 $i$ 的个数， $P c_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[18]

def GC_Pc(len_g, len_m, Groups_result, GC):

Pc = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][6])

Pc[i] = pow(1 / (temp + 0.1346), 2) + 0.0519

return Pc

代码

文本

临界体积

$V c = 28.0018 + i \sum n_{i} V c_{i}$

其中 $n_{i}$ 为基团 $i$ 的个数， $V c_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[19]

def GC_Vc(len_g, len_m, Groups_result, GC):

Vc = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][7])

Vc[i] = temp + 28.0018

return Vc

代码

文本

标准摩尔生成吉布斯函数

$G f = i \sum n_{i} G f_{i} - 1.3385$

其中 $n_{i}$ 为基团 $i$ 的个数， $G f_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[20]

def GC_Gf(len_g, len_m, Groups_result, GC):

Gf = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][8])

Gf[i] = temp - 1.3385

return Gf

代码

文本

标准摩尔生成焓

$H f = i \sum n_{i} H f_{i} + 35.1774$

其中 $n_{i}$ 为基团 $i$ 的个数， $H f_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[21]

def GC_Hf(len_g, len_m, Groups_result, GC):

Hf = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][9])

Hf[i] = temp + 35.1774

return Hf

代码

文本

汽化焓（298K）

$H v = i \sum n_{i} H v_{i} + 9.6127$

其中 $n_{i}$ 为基团 $i$ 的个数， $H v_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[22]

def GC_Hv(len_g, len_m, Groups_result, GC):

Hv = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][10])

Hv[i] = temp + 9.6127

return Hv

代码

文本

熔化焓

$H f u s = i \sum n_{i} H f u s_{i} + 4.50666$

其中 $n_{i}$ 为基团 $i$ 的个数， $H f u s_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[23]

def GC_Hfus(len_g, len_m, Groups_result, GC):

Hfus = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][11])

Hfus[i] = temp + 4.50666

return Hfus

代码

文本

汽化焓（Tb）

$H v b = i \sum n_{i} H v b_{i} + 15.0884$

其中 $n_{i}$ 为基团 $i$ 的个数， $H v b_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[24]

def GC_Hvb(len_g, len_m, Groups_result, GC):

Hvb = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][15])

Hvb[i] = temp + 15.0884

return Hvb

代码

文本

Hildebrand溶解度

$S o lp = i \sum n_{i} S o l p_{i} + 20.7339$

其中 $n_{i}$ 为基团 $i$ 的个数， $S o l p_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[25]

def GC_Solp(len_g, len_m, Groups_result, GC):

Solp = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][24])

Solp[i] = temp + 20.7339

return Solp

代码

文本

闪点

$Fp = i \sum n_{i} F p_{i} + 170.7058$

其中 $n_{i}$ 为基团 $i$ 的个数， $F p_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[26]

def GC_Fp(len_g, len_m, Groups_result, GC):

Fp = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][26])

Fp[i] = temp + 170.7058

return Fp

代码

文本

表面张力（298K）

$St = i \sum n_{i} S t_{i}$

其中 $n_{i}$ 为基团 $i$ 的个数， $S t_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[27]

def GC_St(len_g, len_m, Groups_result, GC):

St = [0 for i in range(len_m)]

for i in range(len_m):

for j in range(len_g):

St[i] += Groups_result[0][i][j] * float(GC[j][27])

return St

代码

文本

Hansen色散溶解度参数

$H SP D = i \sum n_{i} H SP D_{i}$

其中 $n_{i}$ 为基团 $i$ 的个数， $H SP D_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[29]

def GC_hspD(len_g, len_m, Groups_result, GC):

hspD = [0 for i in range(len_m)]

for i in range(len_m):

for j in range(len_g):

hspD[i] += Groups_result[0][i][j] * float(GC[j][32])

return hspD

代码

文本

Hansen极化溶解度参数

$H SPP = i \sum n_{i} H SP P_{i}$

其中 $n_{i}$ 为基团 $i$ 的个数， $H SP P_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[30]

def GC_hspP(len_g, len_m, Groups_result, GC):

hspP = [0 for i in range(len_m)]

for i in range(len_m):

for j in range(len_g):

hspP[i] += Groups_result[0][i][j] * float(GC[j][33])

return hspP

代码

文本

Hansen氢键溶解度参数

$H SP H = i \sum n_{i} H SP H_{i}$

其中 $n_{i}$ 为基团 $i$ 的个数， $H SP H_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[31]

def GC_hspH(len_g, len_m, Groups_result, GC):

hspH = [0 for i in range(len_m)]

for i in range(len_m):

for j in range(len_g):

hspH[i] += Groups_result[0][i][j] * float(GC[j][34])

return hspH

代码

文本

黏度

$μ = exp (i \sum n_{i} μ_{i})$

其中 $n_{i}$ 为基团 $i$ 的个数， $μ_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[32]

def GC_visc(len_g, len_m, Groups_result, GC):

visc = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][35])

visc[i] = exp(temp)

return visc

代码

文本

偏心因子

$ω = 0.9132 (ln (i \sum n_{i} ω_{i}))^{0.0447}$

其中 $n_{i}$ 为基团 $i$ 的个数， $ω_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[33]

def GC_Acentric(len_g, len_m, Groups_result, GC):

Acentric = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][36])

Acentric[i] = 0.9132 * pow(log(temp + 1.0039), 0.0447)

return Acentric

代码

文本

液相摩尔体积（298K）

$Vm = 1000 (i \sum n_{i} V m_{i} + 0.0123)$

其中 $n_{i}$ 为基团 $i$ 的个数， $V m_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[34]

def GC_Vm_298(len_g, len_m, Groups_result, GC):

Vm_298 = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][37])

Vm_298[i] = 1000 * (temp + 0.0123)

return Vm_298

代码

文本

$L C_{50}$ (Fathead Minnow 96-hr)

$FM = i \sum n_{i} F M_{i} + 2.18$

其中 $n_{i}$ 为基团 $i$ 的个数， $F M_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[35]

def GC_nlogLC50FM(len_g, len_m, Groups_result, GC):

nlogLC50FM = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][38])

nlogLC50FM[i] = temp + 2.18

return nlogLC50FM

代码

文本

$L C_{50}$ (Daphnia Magna 48-hr)

$D M = i \sum n_{i} D M_{i} + 2.18$

其中 $n_{i}$ 为基团 $i$ 的个数， $D M_{i}$ 为基团 $i$ 的贡献值。

代码

文本

[36]

def GC_nlogLC50DM(len_g, len_m, Groups_result, GC):

nlogLC50DM = [0 for i in range(len_m)]

for i in range(len_m):

temp = 0

for j in range(len_g):

temp += Groups_result[0][i][j] * float(GC[j][39])

nlogLC50DM[i] = temp + 3.59

return nlogLC50DM

代码

文本

扩散系数

$x_{D w} = exp (- 24.71 + \frac{4209}{T} + 0.04527 T - 0.00003376 T^{2})$

$Vb = 0.285 exp (1.048 V c)$

$D w = exp (ln 0.1955 - 0.433 ln Vb) \cdot \frac{T}{x _{D w}}$

其中 $V c$ 为临界体积。

代码

文本

[37]

def GC_Dw(len_m, Temperature, Vc):

Dw = [0 for i in range(len_m)]

x_Dw = exp(-24.71 + 4209 / Temperature + 0.04527 * Temperature - 0.00003376 * Temperature * Temperature)

for i in range(len_m):

try:

vb = 0.285 * exp(1.048 * log(Vc[i]))

Dw[i] = exp(log(.01955) - 0.433 * log(vb)) * Temperature / x_Dw

except:

Dw[i] = -1e10

return Dw

代码

文本

热导率

$T r = \frac{T}{T c}$

$T b r = \frac{T b}{T c}$

$A = exp (0.6666 ln (∣1 - T r ∣))$

$B = exp (0.6666 ln (∣1 - T b r ∣))$

$λ = \frac{1.11 \times ( 3 + 20 A )}{( 3 + 20 B ) Mw}$

其中 $T c$ 为临界温度， $T b$ 为沸点， $Mw$ 为相对分子质量。

代码

文本

[38]

def GC_lambda(len_m, Temperature, Tc, Tb, Mw):

lambda1 = [0 for i in range(len_m)]

for i in range(len_m):

try:

Tr = Temperature / Tc[i]

Tbr = Tb[i] / Tc[i]

temp_Tr = exp(0.6666 * log(abs(1-Tr)))

temp_Tbr = exp(0.6666 * log(abs(1 - Tbr)))

lambda1[i] = 1.11 * (3 + 20 * temp_Tr) / (3 + 20 * temp_Tbr) / sqrt(Mw[i])

except:

lambda1[i] = -1e10

return lambda1

代码

文本

饱和蒸气压

代码

文本

[39]

def GC_Psat(len_m, Temperature, Pc, Tc, Tb):

Psat = [0 for i in range(len_m)]

Omega = [0 for i in range(len_m)]

for i in range(len_m):

try:

Tr = Temperature / Tc[i]

Tbr = Tb[i] / Tc[i]

Pr = 1 / Pc[i]

F0 = 5.92714 - 6.09648 / Tbr - 1.28862 * log(Tbr) + 0.169347 * pow(Tbr, 6)

F1 = 15.2518 - 15.6875 / Tbr - 13.4721 * log(Tbr) + 0.43577 * pow(Tbr, 6)

Omega[i] = (log(Pr) - F0) / F1

F01 = 5.92714 - 6.09648 / Tr - 1.28862 * log(Tr) + 0.169347 * pow(Tr, 6)

F11 = 15.2518 - 15.6875 / Tr - 13.4721 * log(Tr) + 0.43577 * pow(Tr, 6)

Psat[i] = 1e5 * (Pc[i] * exp(F01 + Omega[i] * F11))

except:

Psat[i] = -1e10

return Psat, Omega

代码

文本

密度

代码

文本

[40]

def GC_rho(len_m, Temperature, Omega, Tc, Pc, Mw):

rho = [0 for i in range(len_m)]

for i in range(len_m):

try:

Zra = 0.29056 - 0.08775 * Omega[i]

Tr = Temperature / Tc[i]

temp = exp(0.285714 * log(1 - Tr)) + 1

temp1 = exp(temp * log(Zra))

rho[i] = Mw[i] * Pc[i] / 83.14 / Tc[i] / temp1

except:

rho[i] = -1e10

return rho

代码

文本

溶解度系数

代码

文本

[41]

def GC_SolPar(len_m, Temperature, Omega, Pc, Tc, Hv):

SolPar = [0 for i in range(len_m)]

for i in range(len_m):

try:

Zra = 0.29056 - 0.08775 * Omega[i]

Tr = Temperature / Tc[i]

temp = exp(0.285714 * log(1 - Tr)) + 1

temp1 = exp(temp * log(Zra))

Vm = 83.14 * Tc[i] * temp1 / Pc[i]

SolPar[i] = sqrt((1000 * Hv[i] - 8.314 * Temperature) / Vm)

except:

SolPar[i] = -1e10

return SolPar

代码

文本

定义基团贡献法函数，计算所有物性

整合所有物性计算函数，批量计算输入分子的所有目标性质。

代码

文本

[42]

def GroupContribution(len_g, len_m, Groups_result, GC, Temperature):

# Mw

Mw = GC_Mw(len_g, len_m, Groups_result, GC)

# Tm

Tm = GC_Tm(len_g, len_m, Groups_result, GC)

# Tb

Tb = GC_Tb(len_g, len_m, Groups_result, GC)

# Tc

Tc = GC_Tc(len_g, len_m, Groups_result, GC)

# Pc

Pc = GC_Pc(len_g, len_m, Groups_result, GC)

# Vc

Vc = GC_Vc(len_g, len_m, Groups_result, GC)

# Gf

Gf = GC_Gf(len_g, len_m, Groups_result, GC)

# Hf

Hf = GC_Hf(len_g, len_m, Groups_result, GC)

# Hv

Hv = GC_Hv(len_g, len_m, Groups_result, GC)

# Hfus

Hfus = GC_Hfus(len_g, len_m, Groups_result, GC)

# Hvb

Hvb = GC_Hvb(len_g, len_m, Groups_result, GC)

# Solp

Solp = GC_Solp(len_g, len_m, Groups_result, GC)

# Fp

Fp = GC_Fp(len_g, len_m, Groups_result, GC)

# St

St = GC_St(len_g, len_m, Groups_result, GC)

# hspD

hspD = GC_hspD(len_g, len_m, Groups_result, GC)

# hspP

hspP = GC_hspP(len_g, len_m, Groups_result, GC)

# hspH

hspH = GC_hspH(len_g, len_m, Groups_result, GC)

# visc

visc = GC_visc(len_g, len_m, Groups_result, GC)

# Acentric

Acentric = GC_Acentric(len_g, len_m, Groups_result, GC)

# Vm_298

Vm_298 = GC_Vm_298(len_g, len_m, Groups_result, GC)

# nlogLC50FM

nlogLC50FM = GC_nlogLC50FM(len_g, len_m, Groups_result, GC)

# nlogLC50DM

nlogLC50DM = GC_nlogLC50DM(len_g, len_m, Groups_result, GC)

# Dw

Dw = GC_Dw(len_m, Temperature, Vc)

# lambda

lambda1 = GC_lambda(len_m, Temperature, Tc, Tb, Mw)

# Psat, Omega

Psat, Omega = GC_Psat(len_m, Temperature, Pc, Tc, Tb)

# rho

rho = GC_rho(len_m, Temperature, Omega, Tc, Pc, Mw)

# SolPar

SolPar = GC_SolPar(len_m, Temperature, Omega, Pc, Tc, Hv)

return Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar

代码

文本

定义结果输出函数

格式化输出计算结果，处理数值范围（如超出合理范围记为NaN），提供清晰的可读性。

代码

文本

[43]

def print_results(molecules, len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar):

#print('\n********************************************************************************\n')

#print('\nGroup Segments:\n--------------------------------------------------------------------------------')

#print(Groups_result[1])

# display results

property_prediction = '\n********************************************************************************\n'

property_prediction += '\nProperty prediction (Group contribution):\n--------------------------------------------------------------------------------\n'

for i in range(len_m):

property_prediction += str(i + 1) + '\t' + molecules[i] + '\n'

if Groups_result[0][i][-2] != 1:

property_prediction += 'Warning: Group segment fails for this molecule, property prediction fails!\n'

property_prediction += 'Mw: Molecular weight (g/mol)\t\t\t' + format1(Mw[i], '.4f') + '\n'

property_prediction += 'Tm: Normal melting point (K)\t\t\t' + format1(Tm[i], '.4f') + '\n'

property_prediction += 'Tb: Normal boiling point (K)\t\t\t' + format1(Tb[i], '.4f') + '\n'

property_prediction += 'Tc: Critical temperature (K)\t\t\t' + format1(Tc[i], '.4f') + '\n'

property_prediction += 'Pc: Critical pressure (bar)\t\t\t' + format1(Pc[i], '.4f') + '\n'

property_prediction += 'Vc: Critical volume (cm^3/mol)\t\t\t' + format1(Vc[i], '.4f') + '\n'

property_prediction += 'Gf: Standard Gibbs free energy of formation\t' + format1(Gf[i], '.4f') + '\n'

property_prediction += ' at 298K (kJ/mol)\n'

property_prediction += 'Hf: Standard enthalpy of formation\t\t' + format1(Hf[i], '.4f') + '\n'

property_prediction += ' at 298K (kJ/mol)\n'

property_prediction += 'Hv: Enthalpy of vaporization at 298K (kJ/mol)\t' + format1(Hv[i], '.4f') + '\n'

property_prediction += 'Hfus: Enthalpy of fusion (kJ/mol)\t\t' + format1(Hfus[i], '.4f') + '\n'

property_prediction += 'Hvb: Enthalpy of vaporization at Tb (kJ/mol)\t' + format1(Hvb[i], '.4f') + '\n'

property_prediction += 'Solp: Hildebrand solubility parameter\t\t' + format1(Solp[i], '.4f') + '\n'

property_prediction += ' at 298K (MPa^0.5)\n'

property_prediction += 'Fp: Flash point (K)\t\t\t\t' + format1(Fp[i], '.4f') + '\n'

property_prediction += 'St: Surface tension at 298K (dym/cm)\t\t' + format1(St[i], '.4f') + '\n'

property_prediction += 'hspD: Hansen dispersive solubility parameter\t' + format1(hspD[i], '.4f') + '\n'

property_prediction += ' (MPa^0.5)\n'

property_prediction += 'hspP: Hansen polar solubility parameter\t\t' + format1(hspP[i], '.4f') + '\n'

property_prediction += ' (MPa^0.5)\n'

property_prediction += 'hspH: Hansen Hydrogen-bond solubility parameter\t' + format1(hspH[i], '.4f') + '\n'

property_prediction += ' (MPa^0.5)\n'

property_prediction += 'visc: Viscosity (cP)\t\t\t\t' + format1(visc[i], '.4f') + '\n'

property_prediction += 'Acentric: Pitzer\'s Acentric Factor\t\t' + format1(Acentric[i], '.4f') + '\n'

property_prediction += 'Vm_298: Liquid molar volume at 298K (cm^3/mol)\t' + format1(Vm_298[i], '.4f') + '\n'

property_prediction += 'nlogLC50FM: Fathead Minnow 96-hr LC50\t\t' + format1(nlogLC50FM[i], '.4f') + '\n'

property_prediction += ' (-log(mol/L))\n'

property_prediction += 'nlogLC50DM: Daphnia Magna 48-hr LC50\t\t' + format1(nlogLC50DM[i], '.4f') + '\n'

property_prediction += ' (-log(mol/L))\n'

property_prediction += 'Dw: Diffusion coefficient (cm/s)\t\t' + format1(Dw[i], '.4f') + '\n'

property_prediction += 'lambda: Thermal conductivity (W/m/K)\t\t' + format1(lambda1[i], '.4f') + '\n'

property_prediction += 'Psat: Vapor pressure (Pa)\t\t\t' + format1(Psat[i], '.4f') + '\n'

property_prediction += 'Omega: Compressibility factor\t\t\t' + format1(Omega[i], '.4f') + '\n'

property_prediction += 'rho: Density (g/cm^3)\t\t\t\t' + format1(rho[i], '.4f') + '\n'

property_prediction += 'SolPar: Solubility parameter (MPa^0.5)\t\t' + format1(SolPar[i], '.4f') + '\n'

property_prediction += '--------------------------------------------------------------------------------\n'

print(property_prediction)

return property_prediction

代码

文本

定义基团贡献法主函数

文件结构

输入：SMILES 列表（input.txt）、基团 SMARTS 数据（Group_SMARTS.npy）、数据库（GC_MG1_DB.db）。
输出：基团分割结果（Group_output.txt）、物性预测报告（控制台输出）。

主函数流程

输入 SMILES 和温度，调用PyGC主函数。
执行基团分割，获取基团计数和状态。
读取数据库，计算各物性。
格式化输出结果，包括时间性能统计。

代码

文本

[48]

def PyGC(molecules, Temperature):

print('\n********************************************************************************\n')

print('> PyGC: Group Contribution based Property Prediction')

print('> Lei Zhang (keleiz@dlut.edu.cn); Qilei Liu (liuqilei@dlut.edu.cn)')

print('> Mar. 22, 2025')

print('> Institute of chemical process systems engineering')

print('> School of chemical engineering')

print('> Dalian University of Technology, Dalian 116024, China')

# group segment

Groups_result = SMILES2Group(molecules)

# read group contribution database

GC = read_database()

len_g = len(GC)

len_m = len(molecules)

# property prediction

Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar = GroupContribution(len_g, len_m, Groups_result, GC, Temperature)

return len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar

代码

文本

测试函数

代码

文本

[49]

if __name__ == "__main__":

start_time = time()

# input data

molecules = ['CC(=O)OC1=CC=CC=C1C(O)=O', 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O', 'CC(C)(C)NCC(O)C1=CC(CO)=C(O)C=C1', 'Cc1c([N+](=O)[O-])cc([N+](=O)[O-])cc1[N+](=O)[O-]', 'C(=C(C(=Cc1ccccc1)c1ccccc1)c1ccccc1)c1ccccc1']

Temperature = 298.15

#try:

# property prediction

len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar = PyGC(molecules, Temperature)

# print results

print_results(molecules, len_m, Groups_result, Mw, Tm, Tb, Tc, Pc, Vc, Gf, Hf, Hv, Hfus, Hvb, Solp, Fp, St, hspD, hspP, hspH, visc, Acentric, Vm_298, nlogLC50FM, nlogLC50DM, Dw, lambda1, Psat, Omega, rho, SolPar)

#except:

# print('\n> There is something wrong with the input SMILES!')

end_time = time()

print('\nWALL TIME:\t', format(end_time - start_time, '.4f'), ' (s)\n')

********************************************************************************

> PyGC: Group Contribution based Property Prediction
> Lei Zhang (keleiz@dlut.edu.cn); Qilei Liu (liuqilei@dlut.edu.cn)
> Mar. 22, 2025
> Institute of chemical process systems engineering
> School of chemical engineering
> Dalian University of Technology, Dalian 116024, China

********************************************************************************

Property prediction (Group contribution):
--------------------------------------------------------------------------------
1	CC(=O)OC1=CC=CC=C1C(O)=O
Mw: Molecular weight (g/mol)			180.1597
Tm: Normal melting point (K)			436.6321
Tb: Normal boiling point (K)			592.0224
Tc: Critical temperature (K)			891.4422
Pc: Critical pressure (bar)			34.3293
Vc: Critical volume (cm^3/mol)			471.2066
Gf: Standard Gibbs free energy of formation	-532.5249
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-680.0922
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	NaN
Hfus: Enthalpy of fusion (kJ/mol)		40.4855
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		20.9713
      at 298K (MPa^0.5)
Fp: Flash point (K)				483.2930
St: Surface tension at 298K (dym/cm)		NaN
hspD: Hansen dispersive solubility parameter	21.0164
      (MPa^0.5)
hspP: Hansen polar solubility parameter		6.0868
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	9.8884
      (MPa^0.5)
visc: Viscosity (cP)				0.4874
Acentric: Pitzer's Acentric Factor		0.8252
Vm_298: Liquid molar volume at 298K (cm^3/mol)	155.6850
nlogLC50FM: Fathead Minnow 96-hr LC50		2.7027
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		-0.8246
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.6770
lambda: Thermal conductivity (W/m/K)		0.1191
Psat: Vapor pressure (Pa)			0.4547
Omega: Compressibility factor			0.2928
rho: Density (g/cm^3)				1.0280
SolPar: Solubility parameter (MPa^0.5)		NaN
--------------------------------------------------------------------------------
2	CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
Mw: Molecular weight (g/mol)			206.2852
Tm: Normal melting point (K)			343.3547
Tb: Normal boiling point (K)			582.2050
Tc: Critical temperature (K)			786.3869
Pc: Critical pressure (bar)			22.6432
Vc: Critical volume (cm^3/mol)			663.4145
Gf: Standard Gibbs free energy of formation	-187.4798
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-448.4679
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	77.6696
Hfus: Enthalpy of fusion (kJ/mol)		28.0466
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		19.2869
      at 298K (MPa^0.5)
Fp: Flash point (K)				446.5630
St: Surface tension at 298K (dym/cm)		28.7115
hspD: Hansen dispersive solubility parameter	17.6759
      (MPa^0.5)
hspP: Hansen polar solubility parameter		2.0559
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	4.3485
      (MPa^0.5)
visc: Viscosity (cP)				13.4884
Acentric: Pitzer's Acentric Factor		0.8248
Vm_298: Liquid molar volume at 298K (cm^3/mol)	202.2285
nlogLC50FM: Fathead Minnow 96-hr LC50		3.9587
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		4.2037
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.5796
lambda: Thermal conductivity (W/m/K)		0.1218
Psat: Vapor pressure (Pa)			0.0451
Omega: Compressibility factor			0.6762
rho: Density (g/cm^3)				1.1090
SolPar: Solubility parameter (MPa^0.5)		20.1055
--------------------------------------------------------------------------------
3	CC(C)(C)NCC(O)C1=CC(CO)=C(O)C=C1
Mw: Molecular weight (g/mol)			239.3089
Tm: Normal melting point (K)			408.2666
Tb: Normal boiling point (K)			642.1434
Tc: Critical temperature (K)			835.7446
Pc: Critical pressure (bar)			28.0000
Vc: Critical volume (cm^3/mol)			713.2700
Gf: Standard Gibbs free energy of formation	-201.2924
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-551.1721
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	151.4397
Hfus: Enthalpy of fusion (kJ/mol)		38.3394
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		25.0683
      at 298K (MPa^0.5)
Fp: Flash point (K)				578.0831
St: Surface tension at 298K (dym/cm)		38.4238
hspD: Hansen dispersive solubility parameter	17.8190
      (MPa^0.5)
hspP: Hansen polar solubility parameter		7.7930
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	24.9708
      (MPa^0.5)
visc: Viscosity (cP)				5568.3835
Acentric: Pitzer's Acentric Factor		0.8459
Vm_298: Liquid molar volume at 298K (cm^3/mol)	206.8158
nlogLC50FM: Fathead Minnow 96-hr LC50		3.1616
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		-0.5715
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.5609
lambda: Thermal conductivity (W/m/K)		0.1218
Psat: Vapor pressure (Pa)			0.0000
Omega: Compressibility factor			1.1138
rho: Density (g/cm^3)				2.1342
SolPar: Solubility parameter (MPa^0.5)		36.4482
--------------------------------------------------------------------------------
4	Cc1c([N+](=O)[O-])cc([N+](=O)[O-])cc1[N+](=O)[O-]
Mw: Molecular weight (g/mol)			227.1332
Tm: Normal melting point (K)			396.5192
Tb: Normal boiling point (K)			633.1089
Tc: Critical temperature (K)			862.9637
Pc: Critical pressure (bar)			31.2272
Vc: Critical volume (cm^3/mol)			578.2647
Gf: Standard Gibbs free energy of formation	196.8224
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		-40.1631
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	99.9948
Hfus: Enthalpy of fusion (kJ/mol)		32.1185
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	NaN
Solp: Hildebrand solubility parameter		22.3972
      at 298K (MPa^0.5)
Fp: Flash point (K)				554.0767
St: Surface tension at 298K (dym/cm)		64.1788
hspD: Hansen dispersive solubility parameter	20.3463
      (MPa^0.5)
hspP: Hansen polar solubility parameter		11.4278
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	5.3196
      (MPa^0.5)
visc: Viscosity (cP)				35.2759
Acentric: Pitzer's Acentric Factor		0.8281
Vm_298: Liquid molar volume at 298K (cm^3/mol)	148.2411
nlogLC50FM: Fathead Minnow 96-hr LC50		4.8665
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		4.1870
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.6169
lambda: Thermal conductivity (W/m/K)		0.1180
Psat: Vapor pressure (Pa)			0.0004
Omega: Compressibility factor			0.7850
rho: Density (g/cm^3)				1.6942
SolPar: Solubility parameter (MPa^0.5)		26.9697
--------------------------------------------------------------------------------
5	C(=C(C(=Cc1ccccc1)c1ccccc1)c1ccccc1)c1ccccc1
Mw: Molecular weight (g/mol)			358.4774
Tm: Normal melting point (K)			433.2688
Tb: Normal boiling point (K)			741.3036
Tc: Critical temperature (K)			1061.0020
Pc: Critical pressure (bar)			18.8377
Vc: Critical volume (cm^3/mol)			1195.7405
Gf: Standard Gibbs free energy of formation	742.1677
    at 298K (kJ/mol)
Hf: Standard enthalpy of formation		487.1753
    at 298K (kJ/mol)
Hv: Enthalpy of vaporization at 298K (kJ/mol)	135.8193
Hfus: Enthalpy of fusion (kJ/mol)		47.9394
Hvb: Enthalpy of vaporization at Tb (kJ/mol)	95.9622
Solp: Hildebrand solubility parameter		19.4211
      at 298K (MPa^0.5)
Fp: Flash point (K)				601.7472
St: Surface tension at 298K (dym/cm)		NaN
hspD: Hansen dispersive solubility parameter	38.2952
      (MPa^0.5)
hspP: Hansen polar solubility parameter		7.9674
      (MPa^0.5)
hspH: Hansen Hydrogen-bond solubility parameter	4.1192
      (MPa^0.5)
visc: Viscosity (cP)				4.0718
Acentric: Pitzer's Acentric Factor		0.8221
Vm_298: Liquid molar volume at 298K (cm^3/mol)	314.8005
nlogLC50FM: Fathead Minnow 96-hr LC50		5.4598
            (-log(mol/L))
nlogLC50DM: Daphnia Magna 48-hr LC50		-0.8082
            (-log(mol/L))
Dw: Diffusion coefficient (cm/s)		0.4437
lambda: Thermal conductivity (W/m/K)		0.0932
Psat: Vapor pressure (Pa)			0.0026
Omega: Compressibility factor			0.2668
rho: Density (g/cm^3)				0.9526
SolPar: Solubility parameter (MPa^0.5)		18.8236
--------------------------------------------------------------------------------


WALL TIME:	 0.1019  (s)

代码

文本

注意事项

基团覆盖性：若分子中存在未定义的基团（状态码 3/4），预测失效，需扩展基团 SMARTS 库。
公式局限性：部分物性公式基于特定数据集拟合，外推时可能误差较大。

代码

文本

总结

这段代码实现了从分子结构到多性质预测的完整流程，结合 RDKit 的分子处理能力和基团贡献法的理论框架，为化学、化工、材料等领域提供了高效的性质预测工具。核心逻辑包括 SMARTS 模式匹配、基团统计、公式计算和结果可视化，适用于批量分子性质分析。

代码

文本

化学信息学与智能产品工程

已赞1

1. 基团分割算法

导入RDKit库

定义分子子结构搜索、合并、统计函数。

定义分子基团分割函数，输出该分子基团分割结果

测试函数

2. 基团贡献法计算分子性质

定义读取基团贡献值数据函数

定义性质预测值范围

相对分子量

熔点

沸点

临界温度

临界压力

临界体积

标准摩尔生成吉布斯函数

标准摩尔生成焓

汽化焓（298K）

熔化焓

汽化焓（Tb）

Hildebrand溶解度

闪点

表面张力（298K）

Hansen色散溶解度参数

Hansen极化溶解度参数

Hansen氢键溶解度参数

黏度

偏心因子

液相摩尔体积（298K）

LC50​ (Fathead Minnow 96-hr)

LC50​ (Daphnia Magna 48-hr)

扩散系数

热导率

饱和蒸气压

密度

溶解度系数

定义基团贡献法函数，计算所有物性

定义结果输出函数

定义基团贡献法主函数

测试函数

总结

$L C_{50}$ (Fathead Minnow 96-hr)

$L C_{50}$ (Daphnia Magna 48-hr)