The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change

Dominik Schlechtweg,Shafqat Mumtaz Virk,Pauline Sander,Emma Sköldberg,Lukas Theuer Linke,Tuo Zhang,Nina Tahmasebi,Jonas Kuhn,Sabine Schulte im Walde
2024-02-05
Abstract:We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics giving insights into sense frequency distributions, semantic variation or changes of senses over time.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the development of a tool for measuring semantic proximity between word usages, word sense clustering, and semantic change. Specifically, the paper introduces an online open-source annotation tool called DURel, which supports both human and computational annotations, combining recent advances in Word-in-Context models. The main features of the DURel tool include: 1. **Standardized Human Annotation**: Judging the semantic proximity between word usages through micro-tasks, requiring minimal preparation work. 2. **Computational Annotation**: Utilizing optimized Word-in-Context models for automatic annotation. 3. **Clustering and Visualization of Annotation Results**: Clustering annotation results using automatic graph clustering techniques and providing visual analysis. 4. **Annotator Consistency Comparison**: Ensuring that the obtained judgments have inter-subjective consistency. 5. **Statistical Analysis**: Providing summary statistics that reveal word sense frequency distribution, semantic variation, or changes in word sense over time. Through these features, the DURel tool aims to simplify the process of word sense identification and dictionary entry creation, and support the analysis of large-scale data, helping to discover new word senses. Additionally, the tool is particularly suitable for lexicographical work, enabling lexicographers to systematically discover new meanings of words.