Abstract:Background Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. Methods In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Results Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Conclusions Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.

Parsing Clinical Text: How Good Are the State-of-the-art Parsers?

An Initial Study of Full Parsing of Clinical Text Using the Stanford Parser.

Syntactic Parsing of Clinical Text: Guideline and Corpus Development with Handling Ill-Formed Sentences

Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.

Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison.

Lexical Characteristics Analysis of Chinese Clinical Documents

Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

Comparison of Syntactic Parsers on Biomedical Texts

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

Parsing Penn Chinese Treebank Based on Lexicalized Model

Performance of Stanford and Minipar Parser on Biomedical Texts

Improving Dependency Parsing on Clinical Text with Syntactic Clusters from Web Text.

Developing a corpus of clinical notes manually annotated for part-of-speech

Parsing Penn Chinese treebank (CTB) with head-driven model

Domain Adaptation for Semantic Role Labeling of Clinical Text.

LI-EMRSQL: Linking Information Enhanced Text2SQL Parsing on Complex Electronic Medical Records

A Fine-Grained Chinese Word Segmentation and Part-of-speech Tagging Corpus for Clinical Text

K-Best Combination of Syntactic Parsers.

Biomedical and clinical English model packages for the Stanza Python NLP library

A Reranking Method for Syntactic Parsing with Heterogeneous Treebanks.