Abstract:Background: Semantic textual similarity (STS) is one of the fundamental tasks in natural language processing (NLP). Many shared tasks and corpora for STS have been organized and curated in the general English domain; however, such resources are limited in the biomedical domain. In 2019, the National NLP Clinical Challenges (n2c2) challenge developed a comprehensive clinical STS dataset and organized a community effort to solicit state-of-the-art solutions for clinical STS. Objective: This study presents our transformer-based clinical STS models developed during this challenge as well as new models we explored after the challenge. This project is part of the 2019 n2c2/Open Health NLP shared task on clinical STS. Methods: In this study, we explored 3 transformer-based models for clinical STS: Bidirectional Encoder Representations from Transformers (BERT), XLNet, and Robustly optimized BERT approach (RoBERTa). We examined transformer models pretrained using both general English text and clinical text. We also explored using a general English STS dataset as a supplementary corpus in addition to the clinical training set developed in this challenge. Furthermore, we investigated various ensemble methods to combine different transformer models. Results: Our best submission based on the XLNet model achieved the third-best performance (Pearson correlation of 0.8864) in this challenge. After the challenge, we further explored other transformer models and improved the performance to 0.9065 using a RoBERTa model, which outperformed the best-performing system developed in this challenge (Pearson correlation of 0.9010). Conclusions: This study demonstrated the efficiency of utilizing transformer-based models to measure semantic similarity for clinical text. Our models can be applied to clinical applications such as clinical text deduplication and summarization.

A Semantic Textual Similarity Measurement Model Based on the Syntactic-Semantic Representation

STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models

A Short-Text Similarity Model Combining Semantic and Syntactic Information

A New Hypred Improved Method for Measuring Concept Semantic Similarity in WordNet.

A Novel Comprehensive Approach for Estimating Concept Semantic Similarity in WordNet

C-STS: Conditional Semantic Textual Similarity

A Hybrid Semantic Similarity Measurement for Geospatial Entities

Linguistically Conditioned Semantic Textual Similarity

A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement

Semantic similarity prediction is better than other semantic similarity measures

Hybrid Attention Based Neural Architecture for Text Semantics Similarity Measurement

Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

Semantic Similarity Analysis via Syntax Dependency Structure and Gate Recurrent Unit

Semantic Similarity Score for Measuring Visual Similarity at Semantic Level

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models

Semantic Similarity Computing Model Based on Multi Model Fine-Grained Nonlinear Fusion

PolyUCOMP-CORE_TYPED: Computing Semantic Textual Similarity Using Overlapped Senses.

MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity

Learning Semantic Textual Similarity from Conversations

MedSTS: A Resource for Clinical Semantic Textual Similarity

Collective Human Opinions in Semantic Textual Similarity