Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale

Shriram Chennakesavalu,Frank Hu,Sebastian Ibarraran,Grant M. Rotskoff

2024-05-22

Abstract:Searching through chemical space is an exceptionally challenging problem because the number of possible molecules grows combinatorially with the number of atoms. Large, autoregressive models trained on databases of chemical compounds have yielded powerful generators, but we still lack robust strategies for generating molecules with desired properties. This molecular search problem closely resembles the "alignment" problem for large language models, though for many chemical tasks we have a specific and easily evaluable reward function. Here, we introduce an algorithm called energy rank alignment (ERA) that leverages an explicit reward function to produce a gradient-based objective that we use to optimize autoregressive policies. We show theoretically that this algorithm is closely related to proximal policy optimization (PPO) and direct preference optimization (DPO), but has a minimizer that converges to an ideal Gibbs-Boltzmann distribution with the reward playing the role of an energy function. Furthermore, this algorithm is highly scalable, does not require reinforcement learning, and performs well relative to DPO when the number of preference observations per pairing is small. We deploy this approach to align molecular transformers to generate molecules with externally specified properties and find that it does so robustly, searching through diverse parts of chemical space. While our focus here is on chemical search, we also obtain excellent results on an AI supervised task for LLM alignment, showing that the method is scalable and general.

Machine Learning,Artificial Intelligence,Chemical Physics,Quantitative Methods

What problem does this paper attempt to address?

The paper proposes a new algorithm called EnergyRank Alignment (ERA) for searching molecules with specific properties in a large-scale chemical space. The current problem is that although large autoregressive models can generate chemical compounds, there is a lack of effective strategies to generate molecules with desired properties. The ERA algorithm draws inspiration from the alignment problem in language modeling, but for the chemical task, we have a concrete and easy-to-evaluate reward function. The main issue mentioned in the paper is how to utilize this reward function to optimize the autoregressive strategy in order to generate molecules with desirable attributes while avoiding excessive constraint on output diversity. The ERA algorithm is related to policy optimization methods in reinforcement learning such as PPO and DPO, but it does not require reinforcement learning and can better control the balance between regularization and reward to promote sample diversity. In the experimental section, ERA successfully adjusted the molecular transformer to generate molecules with specified chemical properties while also achieving excellent performance on the language model alignment benchmark task. This indicates that the method is not only applicable to chemical search but also has scalability and universality. By adjusting the algorithm parameters, researchers are able to control the diversity and specific attributes of generated molecules, such as drug similarity and hydrophobicity. The experimental results demonstrate that ERA effectively guides the model towards target attribute alignment without sacrificing diversity and effectiveness.

Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Sample Efficient Reinforcement Learning with Active Learning for Molecular Design

Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Multi-objective molecular generation via clustered Pareto-based reinforcement learning

Probabilistic hypergraph grammars for efficient molecular optimization

Paddy: Evolutionary Optimization Algorithm for Chemical Systems and Spaces

ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

Controlled exploration of chemical space by machine learning of coarse-grained representations

Optimizing molecules using efficient queries from property evaluations

Molecule Design by Latent Space Energy-Based Modeling and Gradual Distribution Shifting

Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization

Ranking over Regression for Bayesian Optimization and Molecule Selection

Automating reward function configuration for drug design

Molecule generation using transformers and policy gradient reinforcement learning

Preference Optimization for Molecular Language Models

Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning

Evaluation of Reinforcement Learning in Transformer-based Molecular Design

Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

Inferring energy-composition relationships with Bayesian optimization enhances exploration of inorganic materials