Transformer Explainer: Interactive Learning of Text-Generative Models

Aeree Cho,Grace C. Kim,Alexander Karpekov,Alec Helbling,Zijie J. Wang,Seongmin Lee,Benjamin Hoover,Duen Horng Chau
2024-08-09
Abstract:Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques. Our open-sourced tool is available at <a class="link-external link-https" href="https://poloclub.github.io/transformer-explainer/" rel="external noopener nofollow">this https URL</a>. A video demo is available at <a class="link-external link-https" href="https://youtu.be/ECR4oAwocjs" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The paper introduces an interactive visualization tool called TRANSFORMER EXPLAINER, designed to help non-expert users understand how Transformer models (specifically GPT-2) work. Specifically, the tool addresses the following issues: 1. **Lack of transparency**: Although the Transformer architecture performs excellently in many tasks and is widely used in fields such as AI chatbots, its internal workings remain opaque to many non-professionals. 2. **Existing resources are difficult to understand**: Current educational resources, such as blog posts, video tutorials, and 3D visualizations, often overemphasize mathematical details and model implementations, which can be hard for beginners to digest. 3. **Lack of explanatory tools for non-experts**: Existing visualization tools aimed at AI practitioners usually focus on neuron and layer-level interpretability, which also poses challenges for non-experts. To address these challenges, TRANSFORMER EXPLAINER offers the following solutions: - **Multi-level abstraction design**: By presenting information at different levels, from high-level overviews to low-level operational details, it helps users gradually understand the connection between the Transformer model structure and mathematical operations. - **Real-time interactive features**: Integrates a GPT-2 model instance that runs in the user's browser, allowing users to input their own text and observe in real-time how the model predicts the next word. Users can also adjust key parameters (such as temperature) to intuitively understand how these parameters affect prediction results. - **Easy accessibility**: Can be used without installing additional software or special hardware, lowering the barrier to learning modern generative AI technologies. Through these features, TRANSFORMER EXPLAINER not only helps demystify Transformer models but also promotes interactive learning between educators and students, enabling more people to grasp the fundamental knowledge of this important technology.