A scalable and integrated machine learning framework for molecular properties prediction

Guzhong Chen,Zhen Song,Zhiwen Qi,Kai Sundmacher
DOI: https://doi.org/10.1002/aic.18185
IF: 4.167
2023-01-01
AIChE Journal
Abstract:This work introduced a scalable and integrated machine learning (ML) framework to facilitate important steps of building quantitative structure-property relationship (QSPR) models for molecular property prediction. Specifically, the molecular descriptor generation, feature engineering, ML model training, model selection and ensembling, as well as model validation and timing, are integrated into a single workflow within the proposed framework. Unlike existing modeling methods relying upon human experts that primarily focus on model/hyperparameter selection, the proposed framework succeeds by ensembling multiple models and stacking them in multiple layers. The high efficiency and effectiveness of the proposed framework are demonstrated through comparisons with literature-reported QSPR models using identical datasets in three property modeling case studies, that is, the flash point temperature, the melting temperature, and the octanol-water partition coefficients. While requiring much less modeling time, the resultant models by the proposed framework present better predictive performance as compared with the reference models in all three case studies.
What problem does this paper attempt to address?