Integrating Machine Learning and Quantum Circuits for Proton Affinity Predictions

Hongni Jin,Kenneth M. Merz Jr
2024-11-27
Abstract:A key step in interpreting gas-phase ion mobility coupled with mass spectrometry (IM-MS) data for unknown structure prediction involves identifying the most favorable protonated structure. In the gas phase, the site of protonation is determined using proton affinity (PA) measurements. Currently, mass spectrometry and ab initio computation methods are widely used to evaluate PA; however, both methods are resource-intensive and time-consuming. Therefore, there is a critical need for efficient methods to estimate PA, enabling the rapid identification of the most favorable protonation site in complex organic molecules with multiple proton binding sites. In this work, we developed a fast and accurate method for PA prediction by using multiple descriptors in combination with machine learning (ML) models. Using a comprehensive set of 186 descriptors, our model demonstrated strong predictive performance, with an R2 of 0.96 and a MAE of 2.47kcal/mol, comparable to experimental uncertainty. Furthermore, we designed quantum circuits as feature encoders for a classical neural network. To evaluate the effectiveness of this hybrid quantum-classical model, we compared its performance with traditional ML models using a reduced feature set derived from the full set. The result showed that this hybrid model achieved consistent performance comparable to traditional ML models with the same reduced feature set on both a noiseless simulator and real quantum hardware, highlighting the potential of quantum machine learning for accurate and efficient PA predictions.
Machine Learning,Chemical Physics,Quantum Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to predict the proton affinity (PA) of complex organic molecules efficiently and accurately, so as to quickly determine the most favorable protonation sites. This is crucial when interpreting gas - phase ion - mobility spectrometry - mass spectrometry (IM - MS) data to predict unknown structures. ### Problem Background: 1. **Limitations of Experimental Methods**: - Experimental methods for measuring PA, such as mass spectrometry (MS), photoionization mass spectrometry, Fourier - transform ion - cyclotron - resonance mass spectrometry (FT - ICR MS), etc., are precise but resource - intensive and time - consuming. 2. **Limitations of Computational Methods**: - Theoretical computational methods, such as Gn series (G1, G2, G3, G4), GnMP2 and Weizmann1 (W1, W1BD), etc., can provide absolute PA values, but for large - molecule systems, the amount of calculation is too large and it is difficult to apply. ### Solution: The paper proposes a fast and accurate PA prediction method that combines machine learning (ML) and quantum circuits. The specific steps are as follows: 1. **Selection of Feature Descriptors**: - Use 186 descriptors (including 2D/3D physicochemical properties, quantum - chemical descriptors and fingerprint types) as input features, which can comprehensively characterize the properties of molecules. 2. **Application of Traditional Machine - Learning Models**: - Use traditional ML models such as support vector regression (SVR), random - forest regression (RFR), extreme - gradient boosting (XGBoost) and gradient - boosting decision - tree (GBDT) for PA prediction, and further improve the prediction accuracy through ensemble learning (such as Voting Regressor). 3. **Design of Quantum - Classical Hybrid Models**: - Design a parameterized quantum circuit as a feature encoder to encode classical data into quantum states, and then process it through a classical neural network (NN). This hybrid quantum - neural - network (QNN) model can reduce the number of required parameters while maintaining high accuracy. ### Main Achievements: - **Prediction Performance**: The hybrid QNN model performs better than traditional ML models and classical NN on both noise simulators and real - quantum hardware, especially showing significant advantages when dealing with high - dimensional features. - **Efficiency Improvement**: Compared with traditional methods, this model not only improves the prediction accuracy, but also significantly reduces the number of training parameters and the computational cost. Through these improvements, this research provides a new solution for efficiently and accurately predicting proton affinity, which helps to accelerate the identification and analysis of complex organic - molecule structures.