Evaluating Molecular Complexity with Open-Source Machine Learning Approaches to Predict Process Mass Intensity

Nicole Tin,Mandeep Chauhan,Kennedy Agwamba,Yibai Sun,Astrid Parsons,Philippa Payne,Remus Osan
DOI: https://doi.org/10.1021/acsomega.4c02427
IF: 4.1
2024-06-24
ACS Omega
Abstract:The application of green chemistry is critical for cultivating environmental responsibility and sustainable practices in pharmaceutical manufacturing. Process mass intensity (PMI) is a key metric that quantifies the resource efficiency of a manufacturing process, but determining what constitutes a successful PMI of a specific molecule is challenging. A recent approach correlated molecular features to a crowdsourced definition of molecular complexity to determine PMI targets. While recent machine learning tools show promise in predicting molecular complexity, a more extensive application could significantly optimize manufacturing processes. To this end, we refine and expand upon the SMART-PMI tool by Sheridan et al. to create an open-source model and application. Our solution emphasizes explainability and parsimony to facilitate a nuanced understanding of prediction and ensure informed decision-making. The resulting model uses four descriptors-the heteroatom count, stereocenter count, unique topological torsion, and connectivity index chi4n-to compute molecular complexity with a comparable 82.6% predictive accuracy and 0.349 RMSE. We develop a corresponding app that takes in structured data files (SDF) to rapidly quantify molecular complexity and provide a PMI target that can be used to drive process development activities. By integrating machine learning explainability and open-source accessibility, we provide flexible tools to advance the field of green chemistry and sustainable pharmaceutical manufacturing.
What problem does this paper attempt to address?