©️ Copyright 2024 @ Authors
📖 Getting Started Guide
Licensing Agreement: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This document can be executed directly on the Bohrium Notebook. To begin, click the Connect button located at the top of the interface. We have already set up the recommended image ubuntu:22.04-py3.10-pytorch2.0 and the recommended machine type c2_m4_cpu for you.
Uni-pKa is a pKa prediction framework published in the article Bridging Machine Learning and Thermodynamics for Accurate pKa Prediction.
Implementation of Uni-pKa model
Unfold the hidden blocks if you're interested in the implementation details, otherwise please click the "run all" button to initialize everything at your first run.
Loading libraries
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: rdkit in /opt/mamba/lib/python3.10/site-packages (2024.3.6) Requirement already satisfied: Pillow in /opt/mamba/lib/python3.10/site-packages (from rdkit) (11.0.0) Requirement already satisfied: numpy in /opt/mamba/lib/python3.10/site-packages (from rdkit) (1.24.2) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: matplotlib in /opt/mamba/lib/python3.10/site-packages (3.9.2) Requirement already satisfied: numpy>=1.23 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.24.2) Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.3.0) Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (2.8.2) Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (0.12.1) Requirement already satisfied: pillow>=8 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (11.0.0) Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (23.0) Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (4.54.1) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.4.7) Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (3.2.0) Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Dictionary class
Atom type and charge dictionary
3D conformational generation
Transformer backbone
Uni-Mol model
Molecular dataset
Interface for free energy inference
Micro pKa prediction
Load model weights and initialize the predictor.
Important Note: Here provided is a single model weight finetuned on Dwar-iBonD and Novartis datasets for general inference purpose, and the predicted results may slightly differ from the article, which are predicted by a 5-fold ensemble finetuned on only Dwar-iBonD dataset.
Protonation/Deprotonation function for a molecule given the index of the protonated/deprotonated atom.
A glycine with atom indices
Deprotonate the atom 4 in the glycine (oxygen of carboxylic group)
Protonate the atom 0 in the glycine (nitrogen of amino group)
Deprotonating the carboxylic acid in the protonated glycine or Protonating the amino group in the deprotonated glycine converges to the zwitter ion form
Micro-pKa prediction function
Predict all Micro-pKa between 4 protonation states of glycine above. Check the consistency of the thermodynamic cycle of two routes of deprotonation through the non-charged form or the zwitterion form.
Microstate Enumerator
Enumeration function that starts from a single protonation state and ends with the whole macrostate of it and the one after its protonation/deprotonation, given a ionization site template.
We define the minimal ionizaton template for our amino acid example, which only includes the ionization of carboxylic acids and amines.
This is a glutamic acid.
Enumerate the macrostates of the glutamic acid and its deprotonated form.
Drawing a macrostate with microstate indices
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
Continue to generate the fully protonated macrostate and the fully deprotonated one
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
Macro pKa prediction
Macro-pKa prediction function
Predict all macro-pKa of the glutamic acid, and show the corresponding acid/basic macrostates.
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
Distribution fraction prediction
Standardized microstate name calculation function.
Enumeration function that starts with one microstate and ends with the whole protonation ensemble, given the ionization templates.
The protonation ensemble of a glutamic acid when the ionization of its carboxylic group and amino group is considered.
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
Prediction function for fractions of microstates in the protonation ensemble at given pH.
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
Play with more complete templates
Template reading function from a csv template file.
More complete ionization templates are provided in the dataset. The "simple_smarts_pattern.tsv" collects common ionization pattern in medicinal chemistry and is suitable for general purpose.
The more radical "smarts_pattern.tsv" covers all ionization pattern in our training set and was used in the original paper for macro-pKa prediction evaluation. Warning: very unreasonable protonation states in the aqueous solution may be enumerated with this template and affect distribution fraction prediction drastically in some cases!
Substructure \ 0 Sulfate monoether 2 Sulfonic acid 4 Sulfinic acid 6 Seleninic acid 8 Selenenic acid 10 Arsonic acid 12 Thiosulfuric acid 14 Phosph(o/i)nic acid 16 Phosphate (mono/di)ether 18 Carboxyl acid 20 Carboxyl acid enol 22 Carbo(di)thioic acid 24 Carboxyl acid vinylogue 26 Thiol/Thiophenol 28 Phenol 30 Hydroperoxide/Hydroxyl amine 32 Azole 34 Aza-aromatics 36 Oxime 38 Amine 40 Imine 42 Amide 44 Amide imine 46 Sulfamide 48 Phosphamide 50 Enol 52 Hydrocyanic acid 54 Selenol SMARTS Index Acid_or_base 0 [SX4:0](=[O:1])(=[O:2])(-[O:3])-[OX2:4]-[H:5] 4 A 2 [SX4:0](=[O:1])(=[O:2])(-[#6,#7:3])-[OX2:4]-[H:5] 4 A 4 [SX3:0](=[O:1])(-[#6,#7:2])-[OX2:3]-[H:4] 3 A 6 [SeX3:0](=[O:1])(-[#6,#7:2])-[OX2:3]-[H:4] 3 A 8 [SeX2:0]-[OX2:1]-[H:2] 1 A 10 [AsX4:0](=[O:1])(-[#6,#7:2])-[OX2:3]-[H:4] 3 A 12 [S:0]~[SX4:1](~[O:2])(~[O:3])-[O:4]-[H:5] 4 A 14 [PX4:0](=[O:1])(-[OX2:2]-[H:5])(-[#1,#6,#7,#8:... 2 A 16 [PX4:0](=[O:1])(-[O:2])(-[O:3])-[OX2:4]-[H:5] 4 A 18 [$([#6]=[#8,#7]),$(C#N):0]-[OX2:1]-[H:2] 1 A 20 [C:0]=[C:1](-[OX2:2]-[H:3])-[OX2:4]-[H:5] 4 A 22 [CX3:0](=[O,S:1])-[SX2,OX2:2]-[H:3] 2 A 24 [O:0]=[C:1]-[C:2]=[C:3]-[OX2:4]-[H:5] 4 A 26 [#6,#7:0]-[SX2:1]-[H:2] 1 A 28 [c,n:0]-[OX2:1]-[H:2] 1 A 30 [O,N:0]-[OX2:1]-[H:2] 1 A 32 [#7:0]1(-[H:5])-,:[#7,#6:1]=,:[#7,#6:2]-,:[#7,... 0 A 34 [n:0]-[H:1] 0 A 36 [$([#7]:,=[#6,#7]),$([#7]:,=[#6,#7]:,-[#6,#7]:... 1 A 38 [NX4+1:0](-[H:4])(-[CX4,c,#7,#8,#1,S,$(C=C),Cl... 0 A 40 [#6,#7,P,S:0]=[NX3+1:1](-[H:2]) 1 A 42 [$([#7]=[#7,#8]),$(c:c:c:c:[#7+1]):0]-[NX3:1]-... 1 A 44 [$([#6]-,:[O,S,#7]),N+1:0]=,:[NX2:1]-[H:2] 1 A 46 [SX4:0](=[O:1])(=[O:2])-[NX3:3]-[H:4] 3 A 48 [PX4:0](=[O:1])-[NX3:2]-[H:3] 2 A 50 [$([#6]=,:[#7,#8]),$(C#N),#7+1,$([S]=[O]),OH1:... 3 A 52 [N:0]#[C:1]-[H:2] 1 A 54 [SeX2:0]-[H:1] 0 A
Substructure \ 0 Sulfate monoether 2 Sulfonic acid 4 Sulfinic acid 6 Seleninic acid 8 Selenenic acid 10 Arsonic acid 12 Thiosulfuric acid 14 Phosph(o/i)nic acid 16 Phosphate (mono/di)ether 18 Carboxyl acid 20 Carboxyl acid enol 22 Carbo(di)thioic acid 24 Carboxyl acid vinylogue 26 Thiol/Thiophenol 28 Phenol 30 Alcohol 32 Hydroxypyridine 34 Methylpyridine 36 Hydroperoxide/Hydroxyl amine 38 Azole 40 Aza-aromatics 42 N-substitute aza-aromatics 44 Oxime 46 Amine 48 Imine 50 Amide 52 Amide imine 54 Sulfamide 56 Phosphamide 58 Amide vinylogue 60 Di Carbonyl βH 62 Carbonyl βH 64 Carbonyl allene 66 Enol 68 Enol 70 Acyl group 72 Sulfoxide 74 Sulfoxide 76 Sulfoxide 78 Hydrocyanic acid 80 Phosphoryl group 82 Selenonyl group 84 Arsenyl group 86 Carboxyl group 88 Carboxyl group vinylogue 90 Carbonyl group 92 Cyano group 94 Hydroxyl group 96 Selenol 98 Borate 100 Bromomethane 102 Cyclopentadiene 104 Tin alkyl SMARTS Index Acid_or_base 0 [SX4:0](=[O:1])(=[O:2])(-[O:3])-[OX2:4]-[H:5] 4 A 2 [SX4:0](=[O:1])(=[O:2])(-[#6,#7:3])-[OX2:4]-[H:5] 4 A 4 [SX3:0](=[O:1])(-[#6,#7:2])-[OX2:3]-[H:4] 3 A 6 [SeX3:0](=[O:1])(-[#6,#7:2])-[OX2:3]-[H:4] 3 A 8 [SeX2:0]-[OX2:1]-[H:2] 1 A 10 [AsX4:0](=[O:1])(-[#6,#7:2])-[OX2:3]-[H:4] 3 A 12 [S:0]~[SX4:1](~[O:2])(~[O:3])-[O:4]-[H:5] 4 A 14 [PX4:0](=[O:1])(-[OX2:2]-[H:5])(-[#1,#6,#7,#8:... 2 A 16 [PX4:0](=[O:1])(-[O:2])(-[O:3])-[OX2:4]-[H:5] 4 A 18 [$([#6]=[#8,#7]),$(C#N):0]-[OX2:1]-[H:2] 1 A 20 [C:0]=[C:1](-[OX2:2]-[H:3])-[OX2:4]-[H:5] 4 A 22 [CX3:0](=[O,S:1])-[SX2,OX2:2]-[H:3] 2 A 24 [O:0]=[C:1]-[C:2]=[C:3]-[OX2:4]-[H:5] 4 A 26 [#6,#7:0]-[SX2:1]-[H:2] 1 A 28 [c,n:0]-[OX2:1]-[H:2] 1 A 30 [$([CX4]-[$([#6]=,:[#7,#8]),$([#6]=,:[#6]-,:[#... 1 A 32 [n:0]:[c:1]-[OH2+1:2]-[H:3] 2 A 34 [n:0](-[C:1]=[O:2]):[c:3]:[c:4]:[c:5]-[CX4:6]-... 6 A 36 [O,N:0]-[OX2:1]-[H:2] 1 A 38 [#7:0]1(-[H:5])-,:[#7,#6:1]=,:[#7,#6:2]-,:[#7,... 0 A 40 [n:0]-[H:1] 0 A 42 [n+1:0]-[CX4:1]-[H:2] 1 A 44 [$([#7]:,=[#6,#7]),$([#7]:,=[#6,#7]:,-[#6,#7]:... 1 A 46 [NX4+1:0](-[H:4])(-[CX4,c,#7,#8,#1,S,$(C=C),Cl... 0 A 48 [#6,#7,P,S:0]=[NX3+1:1](-[H:2]) 1 A 50 [$([#6]=,:[O,S,#7:0]),$([#7]=[#7,#8]),$([#6]:,... 1 A 52 [$([#6]-,:[O,S,#7]),N+1:0]=,:[NX2:1]-[H:2] 1 A 54 [SX4:0](=[O:1])(=[O:2])-[NX3:3]-[H:4] 3 A 56 [PX4:0](=[O:1])-[NX3:2]-[H:3] 2 A 58 [NX3:0](-[H:5])-,:[#6:1]=,:[#6:2]-,:[$([#6]=,:... 0 A 60 [$([#6,#7]=,:[#7,#8]),$(C#N),$([#6]=,:[#6]-,:[... 1 A 62 [$([#6](=O)(-,:[#7+1,#6,#1])(-,:[#6,#1])),$([N... 1 A 64 [O:0]=[C:1]-[C:2]=[C:3]=[CX3:4]-[H:5] 4 A 66 [$([#6]=,:[#7,#8]),$(C#N),#7+1,$([S]=[O]),c,$(... 3 A 68 [#6:0]=[#6:1](-[$(C=O),$(C(=C)-[OH1]):2])-[OX2... 3 A 70 [#6:0](-[O,N:1])=[OX2+1:2]-[H:3] 2 A 72 [S+1:0](-[OX2:1]-[H:4])(-[#6:2])(-[#6:3]) 1 A 74 [S:0](=[OX2+1:1]-[H:4])(-[#6:2])(-[#6:3]) 1 A 76 [S:0](=[OX2+1:1]-[H:3])(=[#6:2]) 1 A 78 [N:0]#[C:1]-[H:2] 1 A 80 [PX4:0]=[OX2+1:1]-[H:3] 1 A 82 [Se:0]=[OX2+1:1]-[H:3] 1 A 84 [AsX4:0]=[OX2+1:1]-[H:2] 1 A 86 [#6X3:0](:,-[O,#7,S:1])=[OX2+1,SX2+1:2]-[H:3] 2 A 88 [#6X3:0](:,-[#6:1]:,=[#6:2]:,-[O,#7,S:3])=[OX2... 4 A 90 [#6X3:0](:,-[#1,#6:1])(:,-[#1,#6:2])=[OX2+1:3]... 3 A 92 [C:0]#[N:1]-[H:2] 1 A 94 [CX4:0](-[#6,#1:1])(-[#6,#1:2])(-[#6,#1:3])-[O... 4 A 96 [SeX2:0]-[H:1] 0 A 98 [BX3:0]-[OX2:1]-[H:2] 1 A 100 [Br:0]-[CH3:1]-[H:2] 1 A 102 [#6X4:0](-[#1:5])1-,:[#6:1]=,:[#6:2]-,:[#6:3]=... 0 A 104 [N+:0]-[CX4:1](-[H:3])-[SnX4:2] 1 A
Here we try out the drug molecule Amoxicillin.
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
A very radical enumeration!
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>
<IPython.core.display.SVG object>