Abstract:Transformer models have demonstrated impressive capabilities when trained at scale, excelling at difficult cognitive tasks requiring complex reasoning and rational decision-making. In this paper, we explore the application of transformers to chess, focusing on the critical role of the position representation within the attention mechanism. We show that transformers endowed with a sufficiently expressive position representation can match existing chess-playing models at a fraction of the computational cost. Our architecture, which we call the Chessformer, significantly outperforms AlphaZero in both playing strength and puzzle solving ability with 8x less computation and matches prior grandmaster-level transformer-based agents in those metrics with 30x less computation. Our models also display an understanding of chess dissimilar and orthogonal to that of top traditional engines, detecting high-level positional features like trapped pieces and fortresses that those engines struggle with. This work demonstrates that domain-specific enhancements can in large part replace the need for model scale, while also highlighting that deep learning can make strides even in areas dominated by search-based methods.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: how to improve the position representation method of the Transformer model so that it performs excellently in chess games while significantly reducing the computational cost. Specifically, the authors explored the possibility of applying the Transformer model to chess, especially focusing on the crucial role of position representation in the attention mechanism. They proposed a new architecture named Chessformer, which, by using a more efficient position representation method, achieved performance comparable to or even better than that of existing top - level chess models while significantly reducing the computational cost. ### Main Problems and Solutions 1. **Limitations of Existing Models**: - Traditional chess engines rely on specialized tree - search algorithms and hand - designed evaluation functions. - Although modern engines such as AlphaZero have introduced deep neural networks, their convolutional - based models may not be suitable for handling long - distance interaction problems in chess. 2. **Advantages of the Transformer Model**: - The Transformer model is based on the global self - attention mechanism and can better handle long - distance interactions. - Through appropriate domain - specific enhancements (such as position representation), high performance can be achieved without the need for large - scale models. 3. **Importance of Position Representation**: - The paper emphasizes the importance of position representation in the attention mechanism and compares three different position representation methods: absolute position embedding, relative bias, and the method proposed by Shaw et al. - The experimental results show that the method proposed by Shaw et al. performs best in terms of both accuracy and efficiency. 4. **Performance Improvement**: - The Chessformer model not only surpasses AlphaZero in playing strength and puzzle - solving ability, but also requires only about 1/8 of the computational resources of the latter. - When compared with previous grandmaster - level agents, the Chessformer model can achieve the same performance with 30 times less computational resources. ### Conclusion Through optimizing position representation and other improvements, the Transformer model has shown great potential in the field of chess. This work not only demonstrates the progress of deep learning in search - dominated fields, but also indicates the importance of domain - specific enhancements, especially in the case of limited resources.

Mastering Chess with a Transformer Model

Mastering Chess with a Transformer Model

Amortized Planning with Large-Scale Transformers: A Case Study on Chess

Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

Predicting Chess Puzzle Difficulty with Transformers

Chess as a Testbed for Language Model State Tracking

Mastering Chinese Chess AI (Xiangqi) Without Search

Enhancing Chess Reinforcement Learning with Graph Representation

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models

Multi-Game Decision Transformers

Learning Chess With Language Models and Transformers

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Understanding Transformer Reasoning Capabilities via Graph Algorithms

Vision Transformers for Computer Go

Brainformers: Trading Simplicity for Efficiency

Transcendence: Generative Models Can Outperform The Experts That Train Them

Mastering Atari, Go, chess and shogi by planning with a learned model

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

SentiMATE: Learning to play Chess through Natural Language Processing

Do Transformers Really Perform Badly for Graph Representation?