Abstract:Background: Rapid advancements in biomedical research have accelerated the number of relevant electronic documents published online, ranging from scholarly articles to news, blogs, and user-generated social media content. Nevertheless, the vast amount of this information is poorly organized, making it difficult to navigate. Emerging technologies such as ontologies and knowledge bases (KBs) could help organize and track the information associated with biomedical research developments. A major challenge in the automatic construction of ontologies and KBs is the identification of words with its respective sense(s) from a free-text corpus. Word-sense induction (WSI) is a task to automatically induce the different senses of a target word in the different contexts. In the last two decades, there have been several efforts on WSI. However, few methods are effective in biomedicine and life sciences. Methods: We developed a framework for biomedical entity sense induction using a mixture of natural language processing, supervised, and unsupervised learning methods with promising results. It is composed of three main steps: (1) a polysemy detection method to determine if a biomedical entity has many possible meanings; (2) a clustering quality index-based approach to predict the number of senses for the biomedical entity; and (3) a method to induce the concept(s) (i.e., senses) of the biomedical entity in a given context. Results: To evaluate our framework, we used the well-known MSH WSD polysemic dataset that contains 203 annotated ambiguous biomedical entities, where each entity is linked to 2-5 concepts. Our polysemy detection method obtained an F-measure of 98%. Second, our approach for predicting the number of senses achieved an F-measure of 93%. Finally, we induced the concepts of the biomedical entities based on a clustering algorithm and then extracted the keywords of reach cluster to represent the concept. Conclusions: We have developed a framework for biomedical entity sense induction with promising results. Our study results can benefit a number of downstream applications, for example, help to resolve concept ambiguities when building Semantic Web KBs from biomedical text.

A Knowledge Based Method for Chinese Word Sense Induction

Chinese Word Sense Induction Based on Hierarchical Clustering Algorithm

Applying Spectral Clustering for Chinese Word Sense Induction.

Inducing Word Senses for Cross-lingual Document Clustering

Word Sense Induction Using Lexical Chain Based Hypergraph Model.

LSTC System for Chinese Word Sense Induction

To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models

Chinese Word Sense Induction with Basic Clustering Algorithms.

Unsupervised Word Sense Induction Using Rival Penalized Competitive Learning.

Soochow University: Description and Analysis of the Chinese Word Sense Induction System for CLP2010.

Inducing Word Sense with Automatically Learned Hidden Concepts.

Word Sense Indicators: Effective Feature For Chinese Word Sense Disambiguation

Word Sense Disambiguation Based on Word Sense Indicators

Word Clustering for Collocation-Based Word Sense Disambiguation

Solution Strategies for Word Sense Problems Based on Vector Space Model and Maximum Entropy Model

Word Sense Learning Based on Feature Selection and MDL Principle

Chinese WSD Based on Selecting the Best Seeds from Collocations

Learning Word Sense with Feature Selection and Order Identification Capabilities.

A Novel Framework for Biomedical Entity Sense Induction.