Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Charles O'Neill,David Klindt
2024-11-20
Abstract:A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. In this paper, we investigate sparse inference and learning in SAEs through the lens of sparse coding. Specifically, we show that SAEs perform amortised sparse inference with a computationally restricted encoder and, using compressed sensing theory, we prove that this mapping is inherently insufficient for accurate sparse inference, even in solvable cases. Building on this theory, we empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our key contribution is the decoupling of the encoding and decoding processes, which allows for a comparison of various sparse encoding strategies. We evaluate these strategies on two dimensions: alignment with true underlying sparse features and correct inference of sparse codes, while also accounting for computational costs during training and inference. Our results reveal that substantial performance gains can be achieved with minimal increases in compute cost. We demonstrate that this generalises to SAEs applied to large language models (LLMs), where advanced encoders achieve similar interpretability. This work opens new avenues for understanding neural network representations and offers important implications for improving the tools we use to analyse the activations of large language models.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiencies of Sparse Autoencoders (SAEs) in performing sparse inference. Specifically, the author points out that although SAEs can extract interpretable features from neural network representations, their simple linear - nonlinear encoding mechanism limits their ability to perform accurate sparse inference. In addition, due to the limitation of computational resources, there is an "amortisation gap" in SAEs when performing sparse inference, that is, the difference between the optimal sparse code predicted by the SAE encoder and the optimal sparse code that an unconstrained sparse inference algorithm may produce. ### Main problem summary: 1. **Accuracy of sparse inference**: The simple encoding mechanism of SAEs makes it impossible to perform accurate sparse inference under computational constraints. 2. **Amortisation gap**: When performing sparse inference, SAEs cannot reach the optimal sparse code due to the limitation of computational resources, thus resulting in an amortisation gap. 3. **Optimizing sparse inference methods**: Explore whether more complex sparse inference methods can surpass the traditional SAE encoder and improve performance while keeping the computational cost minimized. ### Core contributions of the paper: - **Decoupling the encoding and decoding processes**: By separating the encoding and decoding processes, the author can compare different sparse coding strategies and evaluate their performance in aligning real sparse features and correctly inferring sparse codes. - **Experimental verification**: The author conducted experiments on synthetic datasets and practical applications (such as the activations of the large - language model GPT - 2), showing that more complex methods can significantly improve performance with a relatively small increase in computational cost. - **Theoretical analysis**: Use compressed sensing theory to prove the inherent limitations of SAEs in sparse inference and propose directions for improvement. Through these studies, the paper provides a new perspective for understanding and improving neural network representations, especially when dealing with the activations of large - language models, which has important application value.