Deep reinforcement learning for the direct optimization of gradient separations in liquid chromatography
Alexander Kensert,Pieter Libin,Gert Desmet,Deirdre Cabooter
DOI: https://doi.org/10.1016/j.chroma.2024.464768
2024-04-12
Abstract:While Reinforcement Learning (RL) has already proven successful in performing complex tasks, such as controlling large-scale epidemics, mitigating influenza and playing computer games beyond expert level, it is currently largely unexplored in the field of separation sciences. This paper therefore aims to introduce RL, specifically proximal policy optimization (PPO), in liquid chromatography, and evaluate whether it can be trained to optimize separations directly, based solely on the outcome of a single generic separation as input, and a reward signal based on the resolution between peak pairs (taking a value between [-1,1]). More specifically, PPO algorithms or agents were trained to select linear (1-segment) or multi-segment (2-, 3-, or 16-segment) gradients in 1 experiment, based on the outcome of an initial, generic linear gradient (ϕstart=0.3, ϕend=1.0, and tg=20min), to improve separations. The size of the mixtures to be separated varied between 10 and 20 components. Furthermore, two agents, selecting 16-segment gradients, were trained to perform this optimization using either 2 or 3 experiments, in sequence, to investigate whether the agents could improve separations further, based on previous outcomes. Results showed that the PPO agent can improve separations given the outcome of one generic scouting run as input, by selecting ϕ-programs tailored to the mixture under consideration. Allowing agents more freedom in selecting multi-segment gradients increased the reward from 0.891 to 0.908 on average; and allowing the agents to perform an additional experiment increased the reward from 0.908 to 0.918 on average. Finally, the agent outperformed random experiments as well as standard experiments (ϕstart=0.0, ϕend=1.0, and tg=20min) significantly; as random experiments resulted in average rewards between 0.220 and 0.283, and standard experiments resulted in average rewards of 0.840. In conclusion, while there is room for improvement, the results demonstrate the potential of RL in chromatography and present an interesting future direction for the automated optimization of separations.