Preference Optimization for Molecular Language Models

Ryan Park,Ryan Theisen,Navriti Sahni,Marcel Patek,Anna Cichońska,Rayees Rahman
2023-10-19
Abstract:Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference Optimization to better align generated molecules with chemist preferences. Our findings suggest that this approach is simple, efficient, and highly effective.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main goal of this paper is to address the issue of molecular language models failing to directly encode specific preferences required by chemists when generating chemical structures. Specifically, the researchers explored the use of Direct Preference Optimization (DPO) to fine-tune molecular language models so that the generated molecules better meet the chemists' needs. By doing so, the researchers hope to directly embed any desired attributes during the generation phase, such as specific substructures, the absence of certain reactive groups, or binding affinity to specific targets. The paper demonstrates that this method is not only simple and efficient but also significantly improves the quality of generated molecules in experiments. By conducting experiments on two different language model architectures (LSTM and GPT) and fine-tuning them using DPO, the researchers found that this method can significantly increase the proportion of generated molecules that meet specific filtering criteria, while having minimal impact on other metrics. Additionally, the paper explores how DPO fine-tuning can be used to generate molecules with high predicted biological activity, especially in the context of specific protein targets such as EGFR. Although this method excels in enhancing activity, it may sacrifice some other metrics in certain cases. Overall, the paper indicates that DPO is an effective and computationally inexpensive method for adjusting the molecular generation process according to chemists' preferences.