Monoclonal Antibody Bioprocess Engineering Advancements Using Conversational Artificial Intelligence

Kevin Kawchak
DOI: https://doi.org/10.26434/chemrxiv-2024-3m7m1
2024-10-29
Abstract:Processing high dimensional and complex monoclonal antibody (mAb) bioprocess data in industry is now more efficient due to conversational AI. The human in the loop approach to Large Language Model (LLM) inferencing with document retrieval and chained outputs is a probable benefit to existing biotechnology workflows. Potential risks of using natural language processing are minimized due to the utility of solving problems with vast amounts of structured and unstructured mixed data that can be verified by the Human-AI team. This novel work demonstrates o1-preview, ChatGPT-4o, L3.1-405B, and 3.5 Sonnet models’ fast and stateof-the-art solutions. In specific, o1-preview provided a response to 16 papers 110x faster than the manuscript author’s time after the number of words were set equal. In addition, ChatGPT-4o was 371x faster than an optimal human researcher to examine and provide an estimate regarding dimension reduction or combinatorial optimization for a recent paper by Kao, M., et al. The third LLM speed advantage of 336x by ChatGPT-4o vs. the manuscript author was achieved using monte carlo simulations and markov chain models performance forecasts and a current paper by Konoike, F., et al. Part A featured the individual analysis of 5 recent mAb production papers, which emphasized the proficiency of o1-preview (9.9/10.0), ChatGPT-4o (9.2), and L3.1-405B (9.2) providing a forecast report. Example generations for o1-preview and L3.1-405B typically established connections between using dimension reduction or combinatorial optimization and improving bioprocesses. Part B models generated tables regarding how LLMs can improve numerical data from 5 different papers using monte carlo simulations or markov chain models. An example from ChatGPT-4o (9.0) was substantially more complete, accurate, and convincing than the table provided 3.5 Sonnet (8.0). Part C utilized the report format from Part A combined with the numerical approach from Part B across 6 additional papers, led by o1-preview (9.0) and ChatGPT-4o (8.5). The o1-preview example followed the prompt format well, citing cases of how LLMs will utilize reinforcement learning and bayesian optimization to improve mAb production. The work represents a standard for utilizing a considerable amount of bioprocess data to forecast new results, with the transition into LLMs providing near-real-time production data analysis aided by document retrieval to provide a synergistic effect with existing machine learning techniques.
Chemistry
What problem does this paper attempt to address?