



需要加入以下项目:bohrium.dp.tech/projects/share/455831
镜像:自定义镜像->leiz-dlut:chem
基团贡献法(Group Contribution method, GC method)
Marrero, Jorge A. and Rafiqul Gani. “Group-contribution based estimation of pure component properties.” Fluid Phase Equilibria 183 (2001): 183-208.
https://www.sciencedirect.com/science/article/pii/S0378381201004319
1. 基团分割算法
输入分子的 SMILES 字符串,通过预定义的基团 SMARTS 模式,将分子分割为基团集合,统计各基团的名称和个数。
导入RDKit库
定义分子子结构搜索、合并、统计函数。
- 子结构匹配:使用 RDKit 的 SMARTS 模式匹配,查找分子中所有匹配的基团。
- 原子覆盖检查:确保所有原子被基团覆盖,否则返回状态码 3 或 4(基团集合不完整)。
- 子结构合并:通过冒泡排序优先处理体积大的基团(避免重复计数),合并独立基团,确保无遗漏。
- 输出:返回基团计数、匹配状态(success)和状态码(status)。
定义分子基团分割函数,输出该分子基团分割结果
读取预定义的基团 SMARTS 数据(存储在Group_SMARTS.npy),对输入的多个 SMILES 分子进行批量分割。
输出基团分割结果(包括成功状态和状态码),用于后续物性计算。
测试函数
['CCCC\n', 'CCCCO\n', 'CCOCC\n', 'Cc1ccccc1'] CH3 CH2 CH C CH2=CH CH=CH CH2=C CH=C C=C CH2=C=CH CH2=C=C C=C=C CH#C C#C aCH aC aC(2) aC(3) aN aC-CH3 aC-CH2 aC-CH aC-C aC-CH=CH2 aC-CH=CH aC-C=CH2 aC-C#CH aC-C#C OH aC-OH COOH aC-COOH CH3CO CH2CO CHCO CCO aC-CO CHO aC-CHO CH3COO CH2COO CHCOO CCOO HCOO aC-COO aC-OOCH aC-OOC COO CH3O CH2O CH-O C-O aC-O CH2NH2 CHNH2 CNH2 CH3NH CH2NH CHNH CH3N CH2N aC-NH2 aC-NH aC-N NH2 CH=N C=N CH2CN CHCN CCN aC-CN CN CH2NCO CHNCO CNCO aC-NCO CH2NO2 CHNO2 CNO2 aC-NO2 NO2 ONO ONO2 HCON(CH2)2 HCONHCH2 CONH2 CONHCH3 CONHCH2 CON(CH3)2 CONCH3CH2 CON(CH2)2 CONHCO CONCO aC-CONH2 aC-NH(CO)H aC-N(CO)H aC-CONH aC-NHCO aC-(N)CO NHCONH NH2CONH NH2CON NHCON NCON aC-NHCONH2 aC-NHCONH NHCO CH2Cl CHCl CCl CHCl2 CCl2 CCl3 CH2F CHF CF CHF2 CF2 CF3 CCl2F HCClF CClF2 aC-Cl aC-F aC-I aC-Br I Br F Cl CHNOH CNOH aC-CHNOH OCH2CH2OH OCHCH2OH OCH2CHOH O-OH CH2SH CHSH CSH aC-SH SH CH3S CH2S CHS CS aC-S- SO SO2 SO3 SO3(2) SO4 aC-SO aC-SO2 PH P PO3 PHO3 PO3(2) PHO4 PO4 aC-PO4 aC-P CO3 C2H3O C2H2O C2HO CH2(cyc) CH(cyc) C(cyc) CH=CH(cyc) CH=C(cyc) C=C(cyc) CH2=C(cyc) NH(cyc) N(cyc) CH=N(cyc) C=N(cyc) O(cyc) CO(cyc) S(cyc) SO2(cyc) >NH -O- -S- >CO PO2 CH-N SiHO SiO SiH2 SiH Si (CH3)3N N=N Ccyc=N- Ccyc=CH- Ccyc=NH N=O Ccyc=C P=O N=N(2) C=NH >C=S aC-CON aC=O aN- Na K HCONH CHOCH C2O SiH3 SiH2O CH=C=CH CH=C=C OP(=S)O R CF2cyc CFcyc H2O success status [[2. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1.] [1. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1.] [2. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1.]]
2. 基团贡献法计算分子性质
基于基团分割结果,利用预定义的基团贡献值(存储在 SQLite 数据库GC_MG1_DB.db),通过数学公式计算 20 + 种分子性质(如分子量、熔点、沸点、临界参数等)。
定义读取基团贡献值数据函数
连接 SQLite 数据库,读取基团的贡献值(如分子量贡献、熔点贡献等)。
定义性质预测值范围
如果超出该范围,则记录为NaN
相对分子量
其中为基团的个数,为基团的贡献值。
熔点
其中为基团的个数,为基团的贡献值。
沸点
其中为基团的个数,为基团的贡献值。
临界温度
其中为基团的个数,为基团的贡献值。
临界压力
其中为基团的个数,为基团的贡献值。
临界体积
其中为基团的个数,为基团的贡献值。
标准摩尔生成吉布斯函数
其中为基团的个数,为基团的贡献值。
标准摩尔生成焓
其中为基团的个数,为基团的贡献值。
汽化焓(298K)
其中为基团的个数,为基团的贡献值。
熔化焓
其中为基团的个数,为基团的贡献值。
汽化焓(Tb)
其中为基团的个数,为基团的贡献值。
Hildebrand溶解度
其中为基团的个数,为基团的贡献值。
闪点
其中为基团的个数,为基团的贡献值。
表面张力(298K)
其中为基团的个数,为基团的贡献值。
Hansen色散溶解度参数
其中为基团的个数,为基团的贡献值。
Hansen极化溶解度参数
其中为基团的个数,为基团的贡献值。
Hansen氢键溶解度参数
其中为基团的个数,为基团的贡献值。
黏度
其中为基团的个数,为基团的贡献值。
偏心因子
其中为基团的个数,为基团的贡献值。
液相摩尔体积(298K)
其中为基团的个数,为基团的贡献值。
(Fathead Minnow 96-hr)
其中为基团的个数,为基团的贡献值。
(Daphnia Magna 48-hr)
其中为基团的个数,为基团的贡献值。
扩散系数
其中为临界体积。
热导率
其中为临界温度,为沸点,为相对分子质量。
饱和蒸气压
密度
溶解度系数
定义基团贡献法函数,计算所有物性
整合所有物性计算函数,批量计算输入分子的所有目标性质。
定义结果输出函数
格式化输出计算结果,处理数值范围(如超出合理范围记为NaN),提供清晰的可读性。
定义基团贡献法主函数
文件结构
- 输入:SMILES 列表(input.txt)、基团 SMARTS 数据(Group_SMARTS.npy)、数据库(GC_MG1_DB.db)。
- 输出:基团分割结果(Group_output.txt)、物性预测报告(控制台输出)。
主函数流程
- 输入 SMILES 和温度,调用PyGC主函数。
- 执行基团分割,获取基团计数和状态。
- 读取数据库,计算各物性。
- 格式化输出结果,包括时间性能统计。
测试函数
******************************************************************************** > PyGC: Group Contribution based Property Prediction > Lei Zhang (keleiz@dlut.edu.cn); Qilei Liu (liuqilei@dlut.edu.cn) > Mar. 22, 2025 > Institute of chemical process systems engineering > School of chemical engineering > Dalian University of Technology, Dalian 116024, China ******************************************************************************** Property prediction (Group contribution): -------------------------------------------------------------------------------- 1 CC(=O)OC1=CC=CC=C1C(O)=O Mw: Molecular weight (g/mol) 180.1597 Tm: Normal melting point (K) 436.6321 Tb: Normal boiling point (K) 592.0224 Tc: Critical temperature (K) 891.4422 Pc: Critical pressure (bar) 34.3293 Vc: Critical volume (cm^3/mol) 471.2066 Gf: Standard Gibbs free energy of formation -532.5249 at 298K (kJ/mol) Hf: Standard enthalpy of formation -680.0922 at 298K (kJ/mol) Hv: Enthalpy of vaporization at 298K (kJ/mol) NaN Hfus: Enthalpy of fusion (kJ/mol) 40.4855 Hvb: Enthalpy of vaporization at Tb (kJ/mol) NaN Solp: Hildebrand solubility parameter 20.9713 at 298K (MPa^0.5) Fp: Flash point (K) 483.2930 St: Surface tension at 298K (dym/cm) NaN hspD: Hansen dispersive solubility parameter 21.0164 (MPa^0.5) hspP: Hansen polar solubility parameter 6.0868 (MPa^0.5) hspH: Hansen Hydrogen-bond solubility parameter 9.8884 (MPa^0.5) visc: Viscosity (cP) 0.4874 Acentric: Pitzer's Acentric Factor 0.8252 Vm_298: Liquid molar volume at 298K (cm^3/mol) 155.6850 nlogLC50FM: Fathead Minnow 96-hr LC50 2.7027 (-log(mol/L)) nlogLC50DM: Daphnia Magna 48-hr LC50 -0.8246 (-log(mol/L)) Dw: Diffusion coefficient (cm/s) 0.6770 lambda: Thermal conductivity (W/m/K) 0.1191 Psat: Vapor pressure (Pa) 0.4547 Omega: Compressibility factor 0.2928 rho: Density (g/cm^3) 1.0280 SolPar: Solubility parameter (MPa^0.5) NaN -------------------------------------------------------------------------------- 2 CC(C)CC1=CC=C(C=C1)C(C)C(=O)O Mw: Molecular weight (g/mol) 206.2852 Tm: Normal melting point (K) 343.3547 Tb: Normal boiling point (K) 582.2050 Tc: Critical temperature (K) 786.3869 Pc: Critical pressure (bar) 22.6432 Vc: Critical volume (cm^3/mol) 663.4145 Gf: Standard Gibbs free energy of formation -187.4798 at 298K (kJ/mol) Hf: Standard enthalpy of formation -448.4679 at 298K (kJ/mol) Hv: Enthalpy of vaporization at 298K (kJ/mol) 77.6696 Hfus: Enthalpy of fusion (kJ/mol) 28.0466 Hvb: Enthalpy of vaporization at Tb (kJ/mol) NaN Solp: Hildebrand solubility parameter 19.2869 at 298K (MPa^0.5) Fp: Flash point (K) 446.5630 St: Surface tension at 298K (dym/cm) 28.7115 hspD: Hansen dispersive solubility parameter 17.6759 (MPa^0.5) hspP: Hansen polar solubility parameter 2.0559 (MPa^0.5) hspH: Hansen Hydrogen-bond solubility parameter 4.3485 (MPa^0.5) visc: Viscosity (cP) 13.4884 Acentric: Pitzer's Acentric Factor 0.8248 Vm_298: Liquid molar volume at 298K (cm^3/mol) 202.2285 nlogLC50FM: Fathead Minnow 96-hr LC50 3.9587 (-log(mol/L)) nlogLC50DM: Daphnia Magna 48-hr LC50 4.2037 (-log(mol/L)) Dw: Diffusion coefficient (cm/s) 0.5796 lambda: Thermal conductivity (W/m/K) 0.1218 Psat: Vapor pressure (Pa) 0.0451 Omega: Compressibility factor 0.6762 rho: Density (g/cm^3) 1.1090 SolPar: Solubility parameter (MPa^0.5) 20.1055 -------------------------------------------------------------------------------- 3 CC(C)(C)NCC(O)C1=CC(CO)=C(O)C=C1 Mw: Molecular weight (g/mol) 239.3089 Tm: Normal melting point (K) 408.2666 Tb: Normal boiling point (K) 642.1434 Tc: Critical temperature (K) 835.7446 Pc: Critical pressure (bar) 28.0000 Vc: Critical volume (cm^3/mol) 713.2700 Gf: Standard Gibbs free energy of formation -201.2924 at 298K (kJ/mol) Hf: Standard enthalpy of formation -551.1721 at 298K (kJ/mol) Hv: Enthalpy of vaporization at 298K (kJ/mol) 151.4397 Hfus: Enthalpy of fusion (kJ/mol) 38.3394 Hvb: Enthalpy of vaporization at Tb (kJ/mol) NaN Solp: Hildebrand solubility parameter 25.0683 at 298K (MPa^0.5) Fp: Flash point (K) 578.0831 St: Surface tension at 298K (dym/cm) 38.4238 hspD: Hansen dispersive solubility parameter 17.8190 (MPa^0.5) hspP: Hansen polar solubility parameter 7.7930 (MPa^0.5) hspH: Hansen Hydrogen-bond solubility parameter 24.9708 (MPa^0.5) visc: Viscosity (cP) 5568.3835 Acentric: Pitzer's Acentric Factor 0.8459 Vm_298: Liquid molar volume at 298K (cm^3/mol) 206.8158 nlogLC50FM: Fathead Minnow 96-hr LC50 3.1616 (-log(mol/L)) nlogLC50DM: Daphnia Magna 48-hr LC50 -0.5715 (-log(mol/L)) Dw: Diffusion coefficient (cm/s) 0.5609 lambda: Thermal conductivity (W/m/K) 0.1218 Psat: Vapor pressure (Pa) 0.0000 Omega: Compressibility factor 1.1138 rho: Density (g/cm^3) 2.1342 SolPar: Solubility parameter (MPa^0.5) 36.4482 -------------------------------------------------------------------------------- 4 Cc1c([N+](=O)[O-])cc([N+](=O)[O-])cc1[N+](=O)[O-] Mw: Molecular weight (g/mol) 227.1332 Tm: Normal melting point (K) 396.5192 Tb: Normal boiling point (K) 633.1089 Tc: Critical temperature (K) 862.9637 Pc: Critical pressure (bar) 31.2272 Vc: Critical volume (cm^3/mol) 578.2647 Gf: Standard Gibbs free energy of formation 196.8224 at 298K (kJ/mol) Hf: Standard enthalpy of formation -40.1631 at 298K (kJ/mol) Hv: Enthalpy of vaporization at 298K (kJ/mol) 99.9948 Hfus: Enthalpy of fusion (kJ/mol) 32.1185 Hvb: Enthalpy of vaporization at Tb (kJ/mol) NaN Solp: Hildebrand solubility parameter 22.3972 at 298K (MPa^0.5) Fp: Flash point (K) 554.0767 St: Surface tension at 298K (dym/cm) 64.1788 hspD: Hansen dispersive solubility parameter 20.3463 (MPa^0.5) hspP: Hansen polar solubility parameter 11.4278 (MPa^0.5) hspH: Hansen Hydrogen-bond solubility parameter 5.3196 (MPa^0.5) visc: Viscosity (cP) 35.2759 Acentric: Pitzer's Acentric Factor 0.8281 Vm_298: Liquid molar volume at 298K (cm^3/mol) 148.2411 nlogLC50FM: Fathead Minnow 96-hr LC50 4.8665 (-log(mol/L)) nlogLC50DM: Daphnia Magna 48-hr LC50 4.1870 (-log(mol/L)) Dw: Diffusion coefficient (cm/s) 0.6169 lambda: Thermal conductivity (W/m/K) 0.1180 Psat: Vapor pressure (Pa) 0.0004 Omega: Compressibility factor 0.7850 rho: Density (g/cm^3) 1.6942 SolPar: Solubility parameter (MPa^0.5) 26.9697 -------------------------------------------------------------------------------- 5 C(=C(C(=Cc1ccccc1)c1ccccc1)c1ccccc1)c1ccccc1 Mw: Molecular weight (g/mol) 358.4774 Tm: Normal melting point (K) 433.2688 Tb: Normal boiling point (K) 741.3036 Tc: Critical temperature (K) 1061.0020 Pc: Critical pressure (bar) 18.8377 Vc: Critical volume (cm^3/mol) 1195.7405 Gf: Standard Gibbs free energy of formation 742.1677 at 298K (kJ/mol) Hf: Standard enthalpy of formation 487.1753 at 298K (kJ/mol) Hv: Enthalpy of vaporization at 298K (kJ/mol) 135.8193 Hfus: Enthalpy of fusion (kJ/mol) 47.9394 Hvb: Enthalpy of vaporization at Tb (kJ/mol) 95.9622 Solp: Hildebrand solubility parameter 19.4211 at 298K (MPa^0.5) Fp: Flash point (K) 601.7472 St: Surface tension at 298K (dym/cm) NaN hspD: Hansen dispersive solubility parameter 38.2952 (MPa^0.5) hspP: Hansen polar solubility parameter 7.9674 (MPa^0.5) hspH: Hansen Hydrogen-bond solubility parameter 4.1192 (MPa^0.5) visc: Viscosity (cP) 4.0718 Acentric: Pitzer's Acentric Factor 0.8221 Vm_298: Liquid molar volume at 298K (cm^3/mol) 314.8005 nlogLC50FM: Fathead Minnow 96-hr LC50 5.4598 (-log(mol/L)) nlogLC50DM: Daphnia Magna 48-hr LC50 -0.8082 (-log(mol/L)) Dw: Diffusion coefficient (cm/s) 0.4437 lambda: Thermal conductivity (W/m/K) 0.0932 Psat: Vapor pressure (Pa) 0.0026 Omega: Compressibility factor 0.2668 rho: Density (g/cm^3) 0.9526 SolPar: Solubility parameter (MPa^0.5) 18.8236 -------------------------------------------------------------------------------- WALL TIME: 0.1019 (s)
注意事项
- 基团覆盖性:若分子中存在未定义的基团(状态码 3/4),预测失效,需扩展基团 SMARTS 库。
- 公式局限性:部分物性公式基于特定数据集拟合,外推时可能误差较大。
总结
这段代码实现了从分子结构到多性质预测的完整流程,结合 RDKit 的分子处理能力和基团贡献法的理论框架,为化学、化工、材料等领域提供了高效的性质预测工具。核心逻辑包括 SMARTS 模式匹配、基团统计、公式计算和结果可视化,适用于批量分子性质分析。