Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale

Shriram Chennakesavalu,Frank Hu,Sebastian Ibarraran,Grant M. Rotskoff
2024-05-22
Abstract:Searching through chemical space is an exceptionally challenging problem because the number of possible molecules grows combinatorially with the number of atoms. Large, autoregressive models trained on databases of chemical compounds have yielded powerful generators, but we still lack robust strategies for generating molecules with desired properties. This molecular search problem closely resembles the "alignment" problem for large language models, though for many chemical tasks we have a specific and easily evaluable reward function. Here, we introduce an algorithm called energy rank alignment (ERA) that leverages an explicit reward function to produce a gradient-based objective that we use to optimize autoregressive policies. We show theoretically that this algorithm is closely related to proximal policy optimization (PPO) and direct preference optimization (DPO), but has a minimizer that converges to an ideal Gibbs-Boltzmann distribution with the reward playing the role of an energy function. Furthermore, this algorithm is highly scalable, does not require reinforcement learning, and performs well relative to DPO when the number of preference observations per pairing is small. We deploy this approach to align molecular transformers to generate molecules with externally specified properties and find that it does so robustly, searching through diverse parts of chemical space. While our focus here is on chemical search, we also obtain excellent results on an AI supervised task for LLM alignment, showing that the method is scalable and general.
Machine Learning,Artificial Intelligence,Chemical Physics,Quantitative Methods
What problem does this paper attempt to address?
The paper proposes a new algorithm called EnergyRank Alignment (ERA) for searching molecules with specific properties in a large-scale chemical space. The current problem is that although large autoregressive models can generate chemical compounds, there is a lack of effective strategies to generate molecules with desired properties. The ERA algorithm draws inspiration from the alignment problem in language modeling, but for the chemical task, we have a concrete and easy-to-evaluate reward function. The main issue mentioned in the paper is how to utilize this reward function to optimize the autoregressive strategy in order to generate molecules with desirable attributes while avoiding excessive constraint on output diversity. The ERA algorithm is related to policy optimization methods in reinforcement learning such as PPO and DPO, but it does not require reinforcement learning and can better control the balance between regularization and reward to promote sample diversity. In the experimental section, ERA successfully adjusted the molecular transformer to generate molecules with specified chemical properties while also achieving excellent performance on the language model alignment benchmark task. This indicates that the method is not only applicable to chemical search but also has scalability and universality. By adjusting the algorithm parameters, researchers are able to control the diversity and specific attributes of generated molecules, such as drug similarity and hydrophobicity. The experimental results demonstrate that ERA effectively guides the model towards target attribute alignment without sacrificing diversity and effectiveness.