Abstract:The core of evidence-based medicine is to read and analyze numerous papers in the medical literature on a specific clinical problem and summarize the authoritative answers to that problem. Currently, to formulate a clear and focused clinical problem, the popular PICO framework is usually adopted, in which each clinical problem is considered to consist of four parts: patient/problem (P), intervention (I), comparison (C) and outcome (O). In this study, we compared several classification models that are commonly used in traditional machine learning. Next, we developed a multitask classification model based on a soft-margin SVM with a specialized feature engineering method that combines 1-2gram analysis with TF-IDF analysis. Finally, we trained and tested several generic models on an open-source data set from BioNLP 2018. The results show that the proposed multitask SVM classification model based on 1-2gram TF-IDF features exhibits the best performance among the tested models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to automatically extract PICO elements from the abstracts of randomized controlled trials (RCTs). Specifically, the author aims to improve the efficiency of identifying the four components - patients/problems (P), interventions (I), comparisons (C), and outcomes (O) - from structured abstracts through machine - learning methods, in order to support the research and application of evidence - based medicine (EBM). ### Problem Background In evidence - based medicine, researchers need to read and analyze a large amount of literature to summarize authoritative answers to specific clinical questions. In order to clarify and focus on clinical questions, the PICO framework is usually adopted, in which each clinical question is decomposed into four parts: patients/problems (P), interventions (I), comparisons (C), and outcomes (O). However, since these PICO elements are not clearly labeled in the structured abstracts of most medical papers, the literature retrieval and screening work is very time - consuming. Therefore, being able to automatically extract PICO elements from structured abstracts will greatly improve the work efficiency of evidence - based medicine. ### Main Contributions of the Paper 1. **Feature Engineering Method**: The author proposes a feature engineering method based on TF - IDF (term frequency - inverse document frequency) and combines it with 1 - 2gram analysis. 2. **Multi - task Classification Model**: Develops a multi - task classification model based on soft - margin SVM (soft - margin support vector machine) for automatically extracting PICO elements at the sentence level. 3. **Experimental Verification**: Verifies the effectiveness of the 1 - 2gram model through six groups of controlled experiments and compares its performance with the word2vec word embedding method. 4. **Comparison with Other Classic Methods**: Compares the proposed model with classic classification methods such as random forest (RF), XGBoost, naive Bayes (NB), and long - short - term memory network (LSTM), and the results show that the proposed model is superior in performance. ### Formula Display - **Term Frequency (TF) Formula**: \[ tf_{i,j}=\frac{n_{i,j}}{\sum_{k}n_{k,j}} \] where \(n_{i,j}\) is the number of times the word \(w_i\) appears in the sentence \(s_j\), and the denominator is the sum of the number of times all words appear in the sentence \(s_j\). - **Inverse Document Frequency (IDF) Formula**: \[ idf_i = \log\frac{|D|}{|\{j:w_i\in s_j\}| + 1} \] where \(|D|\) is the total number of sentences in the data set, \(|\{j:w_i\in s_j\}| \) is the number of sentences containing the word \(w_i\), and adding 1 is to prevent the denominator from being zero. - **TF - IDF Formula**: \[ tfidf_{i,j}=tf_{i,j}\times idf_i \] - **Soft - margin SVM Constraint Conditions**: \[ y_i(w^T x_i + b)\geq1-\xi_i,\quad i = 1,\ldots,n \] where \(x_i\) is the vector representation of sentence \(i\), \(y_i\) is the label of sentence \(i\), and \(\xi_i\geq0\) is a slack variable. - **Soft - margin SVM Objective Function**: \[ \min\frac{1}{2}\|w\|^2 + C\sum_{i = 1}^n\xi_i \] where \(C>0\) is a penalty parameter that controls the relative weight between the two terms in the objective function. Through the above methods and models, the author has successfully improved the accuracy and efficiency of automatically extracting PICO elements from medical literature, thus providing strong support for evidence - based medicine research.

Extracting PICO elements from RCT abstracts using 1-2gram analysis and multitask classification

Advancing PICO Element Detection in Biomedical Text via Deep Neural Networks

Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach

Unlocking the Power of Deep PICO Extraction: Step-wise Medical NER Identification

Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs

AlpaPICO: Extraction of PICO frames from clinical trial documents using LLMs

A span-based model for extracting overlapping PICO entities from randomized controlled trial publications

Predicting Clinical Trial Results by Implicit Evidence Integration

Aliababa DAMO Academy at TREC Precision Medicine 2020: State-of-the-art Evidence Retriever for Precision Medicine with Expert-in-the-loop Active Learning

Automated information extraction model enhancing traditional Chinese medicine RCT evidence extraction (Evi-BERT): algorithm development and validation

Using LLMs to label medical papers according to the CIViC evidence model

Distinguishing Transformative from Incremental Clinical Evidence: A Classifier of Clinical Research using Textual features from Abstracts and Citing Sentences

Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit

Inferring Which Medical Treatments Work from Reports of Clinical Trials

Rethinking PICO in the Machine Learning Era: ML-PICO

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks

Extraction of evidence tables from abstracts of randomized clinical trials using a maximum entropy classifier and global constraints

EXTRACTING BI-RADS FEATURES FROM MAMMOGRAPHY REPORTS IN CHINESE BASED ON MACHINE LEARNING

Automated tabulation of clinical trial results: A joint entity and relation extraction approach with transformer-based language representations

Zero-Shot Information Extraction for Clinical Meta-Analysis using Large Language Models