Abstract:Abstract Motivation Automated extraction of population, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation. Results We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLPmod dataset, a randomly selected and re-annotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 Coronavirus Disease 2019 (COVID-19) RCT abstracts, and a dataset of 150 Alzheimer’s disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLPmod dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLPmod dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level. Availability and implementation Our codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO.

STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topics from Scientific Papers

Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports

A Rule-Based Information Extraction System for Human-Readable Semi-Structured Scientific Documents

What's New? Summarizing Contributions in Scientific Literature

Survey and empirical comparison of different approaches for text extraction from scholarly figures

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

AutoIE: An Automated Framework for Information Extraction from Scientific Literature

Metadata Extraction for Scientific Papers

PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents into Structured Data

LitStoryTeller: An Interactive System for Visual Exploration of Scientific Papers Leveraging Named entities and Comparative Sentences

Extracting Core Claims from Scientific Articles

A Supervised Approach to Extractive Summarisation of Scientific Papers

Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

CitationIE: Leveraging the Citation Graph for Scientific Information Extraction

CCS Explorer: Relevance Prediction, Extractive Summarization, and Named Entity Recognition from Clinical Cohort Studies

A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)

Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

All Data on the Table: Novel Dataset and Benchmark for Cross-Modality Scientific Information Extraction

Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis