Automated Mass Extraction of Over 680,000 PICOs from Clinical Study Abstracts Using Generative AI: A Proof-of-Concept Study

Tim Reason,Julia Langham,Andy Gimblett
DOI: https://doi.org/10.1007/s40290-024-00539-6
Abstract:Background: Generative artificial intelligence (GenAI) shows promise in automating key tasks involved in conducting systematic literature reviews (SLRs), including screening, bias assessment and data extraction. This potential automation is increasingly relevant as pharmaceutical developers face challenging requirements for timely and precise SLRs using the population, intervention, comparator and outcome (PICO) framework, such as those under the impending European Union (EU) Health Technology Assessment Regulation 2021/2282 (HTAR). This proof-of-concept study aimed to evaluate the feasibility, accuracy and efficiency of using GenAI for mass extraction of PICOs from PubMed abstracts. Methods: Abstracts were retrieved from PubMed using a search string targeting randomised controlled trials. A PubMed clinical study 'specific/narrow' filter was also applied. Retrieved abstracts were processed using the OpenAI Batch application programming interface (API), which allowed parallel processing and interaction with Generative Pre-trained Transformer 4 Omni (GPT-4o) via custom Python scripts. PICO elements were extracted using a zero-shot prompting strategy. Results were stored in CSV files and subsequently imported into a PostgreSQL database. Results: The PubMed search returned 682,667 abstracts. PICOs from all abstracts were extracted in < 3 h, with an average processing time of 200 s per 1000 abstracts. A total of 395,992,770 tokens were processed, with an average of 580 tokens per abstract. The total cost was $3390. On the basis of a random sample of 350 abstracts, human verification confirmed that GPT-4o accurately and comprehensively extracted 342 (98%) of all PICOs, with only outcome elements rarely missed. Conclusions: Using GenAI to extract PICOs from clinical study abstracts could fundamentally transform the way SLRs are conducted. By enabling pharmaceutical developers to anticipate PICO requirements, this approach allows for proactive preparation for the EU HTAR process, or other health technology assessments (HTAs), streamlining efficiency and reducing the burden of meeting these requirements.
What problem does this paper attempt to address?