Abstract:Background: Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain. Methods: We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM. Conclusions: We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.

Two statistics methods of Chinese word sense disambiguation

Unsupervised Word Sense Disambiguation Based on WordNet

Word Sense Disambiguation Based on Improved Bayesian Classifiers

Chinese WSD Based on Selecting the Best Seeds from Collocations

Word Sense Disambiguation: A Structured Learning Perspective.

Coarse-Grained Word Sense Disambiguation Using Features Described in the Lexicon

Research on dual pattern of unsupervised and supervised word sense disambiguation

Analysis and Comparison of 4 Kinds of Statistical Word Sense Disambiguation Models

Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited

A survey of Chinese word sense disambiguation:Resources,methods and evaluation

Survey of Word Sense Disambiguation Approaches.

Word Sense Disambiguation Based on Positional Weighted Context

Chinese Word Sense Disambiguation Based on Extension Theory

The Research Progress of Statistical Word Sense Disambiguation

Word Sense Disambiguation using Knowledge-based Word Similarity

A Unified Model for Word Sense Representation and Disambiguation.

Word Sense Disambiguation Method with Topic Feature

A Study in Dictionary-Based All-word Word Sense Disambiguation for Pre-Qin Chinese

Knowledge-based biomedical word sense disambiguation: comparison of approaches

An Approach to Corpus-based Word Sense Disambiguation

Ambiguity Meets Uncertainty: Investigating Uncertainty Estimation for Word Sense Disambiguation