Abstract:Formulations, or mixtures of chemical ingredients, are ubiquitously found across material science applications, such as themoplastics, consumer packaged goods, and energy storage devices. However, finding formulations with optimal properties is difficult because of the non-obvious connection between the individual ingredient structures and compositions to downstream mixture properties. Computational approaches that could traverse the expansive design space offer a promising solution to finding formulations with improved properties while minimizing the number of experiments. In this work, we generated a large formulation dataset using high-throughput classical molecular dynamics simulations that resulted in more than 30,000 solvent mixtures ranging between pure component to five component systems. We developed three formulation-property relationship approaches to create machine learning models which use the ingredient structure and composition as input to predict a formulation property: formulation descriptor aggregation (FDA), formulation descriptor Set2Set (FDS2S), and formulation graph (FG). We found that FDS2S, a new approach that uses a Set2Set layer to aggregate molecular descriptors of individual ingredients, outperforms all other approaches in accurately predicting density, heat of vaporization, and enthalpy of mixing that were computed from molecular simulations. Feature importance analysis of FDA models reveal that specific substructures are important to predicting these formulation properties, which is useful in the design of formulations to achieve target properties. When leveraging an active learning framework to iteratively suggest the next ingredient and composition to experiment on, we found that formulation-property relationships can identify formulations with the highest property values at least two to three times faster than randomly guessing. The results demonstrate that formulation-property relationships provide valuable insight to suggest the next experiment even when starting from a limited dataset of ~100 examples. Our research demonstrates the utility of high-throughput simulations and machine learning algorithms applied to designing formulations with promising properties, which could broadly accelerate the design of new materials for a wide range of applications, such as improving the performance of liquid electrolytes for batteries, fuel mixtures for oil and gas, solvent additives for perfumes or paints, and more.

Optimization‐based cosmetic formulation: Integration of mechanistic model, surrogate model, and heuristics

Multi-objective optimization of aromatic extraction process

Design of Multi-Drug Combinations for Poly-Pharmacological Effects Using Composition-Activity Relationship Modeling and Multi-Objective Optimization Approach: Application in Traditional Chinese Medicine

Optimization of Fabrication Parameters to Prepare Tea Catechin-Loaded Liposomes Using Response Surface Methodology

Application of Pharmacodynamics-Based Optimization to the Extraction of Bioactive Compounds from Chansu

Optimization and Integration of Pharmaceutical Spectroscopic Analysis

Optimal design and experimental validation of emulsified cosmetic products: a multiscale approach

Design of formulated fragrant products using rough set machine learning and molecular design tools

Development and Optimization of a Topical Formulation with Castanea sativa Shells Extract Based on the Concept “Quality by Design”

Dynamic Process Optimization for Product Quality with Molecular Structure

[Quantitative Analysis Method of Shengxuebao Mixture by HPLC-UV-MS Based on Quality by Design Concept].

Surrogate Equations of State for Equation Oriented Optimization of Polymerization Processes

Design Space Development for the Extraction Process of Danhong Injection Using a Monte Carlo Simulation Method.

Leveraging High-throughput Molecular Simulations and Machine Learning for Formulation Design

Multi-objective Optimization Algorithm Research on Tobacco Leaf Blend Formulation Design

Design of fragrance molecules using computer-aided molecular design with machine learning

Leveraging Numerical Simulation Technology to Advance Drug Preparation: A Comprehensive Review of Application Scenarios and Cases

Incorporating Machine Learning in Computer-Aided Molecular Design for Fragrance Molecules

Computational design of structured chemical products

Quantum-Mechanics Calculations Elucidate Skin-Sensitizing Pharmaceutical Compounds

Multi-objective Optimization Method of Composite Imagery-oriented Product Form