SwiftDossier: Tailored Automatic Dossier for Drug Discovery with LLMs and Agents

Gabriele Fossi,Youssef Boulaimena,Leila Outemzabeta,Nathalie Jeanraya,Stephane Gerarta,Sebastien Vachenca,Joanna Giemzaa,Salvatore Raieli
2024-09-24
Abstract:The advancement of artificial intelligence algorithms has expanded their application to several fields such as the biomedical domain. Artificial intelligence systems, including Large Language Models (LLMs), can be particularly advantageous in drug discovery, which is a very long and expensive process. However, LLMs by themselves lack in-depth knowledge about specific domains and can generate factually incorrect information. Moreover, they are not able to perform more complex actions that imply the usage of external tools. Our work is focused on these two issues. Firstly, we show how the implementation of an advanced RAG system can help the LLM to generate more accurate answers to drug-discovery-related questions. The results show that the answers generated by the LLM with the RAG system surpass in quality the answers produced by the model without RAG. Secondly, we show how to create an automatic target dossier using LLMs and incorporating them with external tools that they can use to execute more intricate tasks to gather data such as accessing databases and executing code. The result is a production-ready target dossier containing the acquired information summarized into a PDF and a PowerPoint presentation.
Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address two main issues: 1. **Improving the accuracy and reliability of Large Language Models (LLMs) in drug discovery**: - While LLMs perform excellently in many fields, they may generate inaccurate or incorrect information in specific domains (such as biomedicine and drug discovery) due to a lack of in-depth knowledge. - The paper proposes implementing an advanced Retrieval-Augmented Generation (RAG) system to help LLMs generate more accurate answers. Experimental results show that the quality of answers generated by LLMs using the RAG system is significantly better than those generated by LLMs without RAG. 2. **Creating automated Target Dossiers**: - Target dossiers are important tools for evaluating target suitability in the drug discovery process and usually require a significant amount of time and effort to prepare. - The paper demonstrates how to use LLMs in combination with external tools (such as database queries, code execution, etc.) to automatically generate high-quality target dossiers, including PDF documents and PowerPoint presentations. These documents contain the latest information retrieved from multiple databases and are annotated with information sources to ensure verifiability. Through these two improvements, the paper aims to enhance the efficiency and accuracy of the drug discovery process, reducing human errors and time costs.