Leveraging Large Language Models for Medical Information Extraction and Query Generation

Georgios Peikos,Pranav Kasela,Gabriella Pasi
2024-10-31
Abstract:This paper introduces a system that integrates large language models (LLMs) into the clinical trial retrieval process, enhancing the effectiveness of matching patients with eligible trials while maintaining information privacy and allowing expert oversight. We evaluate six LLMs for query generation, focusing on open-source and relatively small models that require minimal computational resources. Our evaluation includes two closed-source and four open-source models, with one specifically trained in the medical field and five general-purpose models. We compare the retrieval effectiveness achieved by LLM-generated queries against those created by medical experts and state-of-the-art methods from the literature. Our findings indicate that the evaluated models reach retrieval effectiveness on par with or greater than expert-created queries. The LLMs consistently outperform standard baselines and other approaches in the literature. The best performing LLMs exhibit fast response times, ranging from 1.7 to 8 seconds, and generate a manageable number of query terms (15-63 on average), making them suitable for practical implementation. Our overall findings suggest that leveraging small, open-source LLMs for clinical trials retrieval can balance performance, computational efficiency, and real-world applicability in medical settings.
Information Retrieval
What problem does this paper attempt to address?
This paper attempts to address the issue of how to utilize large language models (LLMs) to generate effective queries in the clinical trial matching process, in order to improve the efficiency of matching patients with suitable clinical trials, while maintaining information privacy and expert supervision. Specifically, the paper focuses on the following points: 1. **Improving matching efficiency**: By using LLMs to generate queries, the aim is to enhance the efficiency of screening suitable trials for patients from a large number of clinical trials. 2. **Maintaining information privacy**: Ensuring that sensitive patient information is not disclosed during the query generation process. 3. **Expert supervision**: Allowing medical experts to review and modify the generated queries to ensure their accuracy and safety. 4. **Resource efficiency**: Evaluating the application effectiveness of small, open-source LLMs in actual medical environments, as these models are more feasible in resource-limited medical institutions. The paper verifies whether the above goals can be achieved by comparing the effectiveness of queries generated by different LLMs with those created by medical experts, as well as with other existing methods. The research results indicate that small, open-source LLMs perform excellently in generating effective queries, even surpassing large closed-source models and medical experts in certain metrics.