The Attention Mechanism Demystiûed

Rohan K. Mehta
Abstract:he transformer 3 a neural net architecture proposed in Vaswani et al., 2017 3 has taken the NLP world by storm in the past few years, contributing to the recent success of OpenAI's GPT-3 among many others. But while there is an abundance of online material describing the transformer architecture, its crucial theoretical contribution 3 the key, query, and value attention mechanism 3 lacks a clear, accessible explanation, especially from the perspective of what motivated it and why it works. The focus of this post is to hopefully provide some of that intuition, and demonstrate why the attention mechanism is so powerful.
Computer Science
What problem does this paper attempt to address?