PICO to PICOS: Weak Supervision to Extend Datasets with New Labels

Anjani Dhrangadhariya,Gaetano Manzo,Henning Müller
DOI: https://doi.org/10.3233/SHTI240775
2024-08-22
Abstract:Hand-labelling clinical corpora can be costly and inflexible, requiring re-annotation every time new classes need to be extracted. PICO (Participant, Intervention, Comparator, Outcome) information extraction can expedite conducting systematic reviews to answer clinical questions. However, PICO frequently extends to other entities such as Study type and design, trial context, and timeframe, requiring manual re-annotation of existing corpora. In this paper, we adapt Snorkel's weak supervision methodology to extend clinical corpora to new entities without extensive hand labelling. Specifically, we enrich the EBM-PICO corpus with new entities through an example of "Study type and design" extraction. Using weak supervision, we obtain programmatic labels on 4,081 EBM-PICO documents, achieving an F1-score of 85.02% on the test set.
What problem does this paper attempt to address?