Transformers, Contextualism, and Polysemy

Jumbly Grindrod
2024-09-26
Abstract:The transformer architecture, introduced by Vaswani et al. (2017), is at the heart of the remarkable recent progress in the development of language models, including widely-used chatbots such as Chat-GPT and Claude. In this paper, I argue that we can extract from the way the transformer architecture works a theory of the relationship between context and meaning. I call this the transformer theory, and I argue that it is novel with regard to two related philosophical debates: the contextualism debate regarding the extent of context-sensitivity across natural language, and the polysemy debate regarding how polysemy should be captured within an account of word meaning.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two related debates in linguistics and philosophy: the contextualism debate and the polysemy debate. Specifically: 1. **Contextualism Debate**: This debate focuses on the degree and nature of context - sensitivity in natural languages. Contextualists believe that context - sensitivity is a common feature of natural languages, and any sentence can have different interpretations in different usage situations. They also generally think that this will have an important impact on the division of semantics/pragmatics. 2. **Polysemy Debate**: This debate focuses on how polysemous expressions (that is, a word may have multiple related meanings in different contexts) are stored in the lexicon. For example, how to distinguish and represent homonyms and polysemous words. The author proposes that by analyzing the way the Transformer architecture processes language data, new perspectives can be provided for these two debates. The author calls this new perspective "Transformer Theory" and believes that it is novel in the following two aspects: - **The Relationship between Context and Meaning**: The Transformer architecture introduces context - sensitivity through the self - attention mechanism, but this sensitivity does not fully conform to the traditional contextualist view. Instead, it allows for a more flexible way of representing meaning, taking into account the influence of context while also retaining a certain degree of fixed meaning. - **The Processing of Polysemous Words**: When processing polysemous words, the Transformer architecture can dynamically adjust the meaning representation of words according to the context. This is different from traditional methods of processing polysemous words, which usually assume that the meaning of a word is fixed or has a limited number of variation forms. In summary, this paper aims to explore whether and how the Transformer architecture provides new theoretical support for the above two philosophical and linguistic debates.