Contextualized word senses: from attention to compositionality

Pablo Gamallo
DOI: https://doi.org/10.1515/lingvan-2022-0125
2023-12-02
Abstract:The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very fruitful, they continue to be models with little or no interpretability and explainability. One of the tasks for which they are best suited is the encoding of the contextual sense of words using contextualized embeddings. In this paper we propose a transparent, interpretable, and linguistically motivated strategy for encoding the contextual sense of words by modeling semantic compositionality. Particular attention is given to dependency relations and semantic notions such as selection preferences and paradigmatic classes. A partial implementation of the proposed model is carried out and compared with Transformer-based architectures for a given semantic task, namely the similarity calculation of word senses in context. The results obtained show that it is possible to be competitive with linguistically motivated models instead of using the black boxes underlying complex neural architectures.
Computation and Language
What problem does this paper attempt to address?
The paper primarily aims to address the current shortcomings of language models (especially those based on the Transformer architecture) in terms of interpretability and semantic compositionality. The authors propose a new, symbol-based method to encode the meaning of words in context and compare it with the attention-based Transformer model. Specifically, the paper attempts to solve the following key issues: 1. **Model Interpretability**: Although complex neural network models like Transformers have achieved great success in natural language processing tasks, they remain "black box" models, lacking transparency and interpretability. 2. **Semantic Compositionality**: Current models often struggle to capture the systematic compositional rules in natural language, meaning that the meaning of expressions cannot be simply inferred from their components. 3. **Data Efficiency**: Compared to humans, these models require a large amount of training data to generalize correctly. To address the above issues, the authors propose a dependency-based compositional model that utilizes word selection preferences and paradigm categories to construct context-sensitive word meaning representations. This approach is more transparent and interpretable, capable of constructing the meaning of composite expressions through explicit grammatical rules. Additionally, this method is designed to be compared with the attention-based Transformer model to evaluate its ability to generate context-sensitive word vectors. Through a series of experiments, the authors demonstrate that the proposed model not only competes with complex neural architectures on specific semantic tasks but also provides more interpretability and structured knowledge, thereby overcoming some of the shortcomings of purely neural models.