Abstract:Protein language models (PLMs) have demonstrated remarkable success in protein modeling and design, yet their internal mechanisms for predicting structure and function remain poorly understood. Here we present a systematic approach to extract and analyze interpretable features from PLMs using sparse autoencoders (SAEs). By training SAEs on embeddings from the PLM ESM-2, we identify up to 2,548 human-interpretable latent features per layer that strongly correlate with up to 143 known biological concepts such as binding sites, structural motifs, and functional domains. In contrast, examining individual neurons in ESM-2 reveals up to 46 neurons per layer with clear conceptual alignment across 15 known concepts, suggesting that PLMs represent most concepts in superposition. Beyond capturing known annotations, we show that ESM-2 learns coherent concepts that do not map onto existing annotations and propose a pipeline using language models to automatically interpret novel latent features learned by the SAEs. As practical applications, we demonstrate how these latent features can fill in missing annotations in protein databases and enable targeted steering of protein sequence generation. Our results demonstrate that PLMs encode rich, interpretable representations of protein biology and we propose a systematic framework to extract and analyze these latent features. In the process, we recover both known biology and potentially new protein motifs. As community resources, we introduce InterPLM (interPLM.ai), an interactive visualization platform for exploring and analyzing learned PLM features, and release code for training and analysis at github.com/ElanaPearl/interPLM.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to understand how protein language models (PLMs) internally represent and predict protein structure and function. Although PLMs have achieved remarkable success in protein modeling and design, their internal mechanisms are still not well - understood. Specifically, the authors hope to extract and analyze interpretable features in PLMs by using sparse auto - encoders (SAEs) to better understand how these models learn and represent protein biological knowledge. ### Main objectives of the paper: 1. **Extract interpretable features**: Train SAEs to extract up to 2,548 human - interpretable latent features per layer from the embeddings of PLM ESM - 2. 2. **Verify the effectiveness of features**: These features have a strong correlation with 143 known biological concepts (such as binding sites, structural motifs, and functional domains). 3. **Reveal new biological insights**: ESM - 2 not only captures known annotations but also learns coherent concepts that may not have existing annotations. 4. **Practical applications**: Demonstrate how to use these latent features to fill in missing annotations in protein databases and achieve directed control of protein sequence generation. 5. **Provide tools and resources**: Develop an interactive visualization platform InterPLM (interPLM.ai), and code for training and analyzing features (github.com/ElanaPearl/interPLM). ### Main methods and techniques: - **Sparse auto - encoders (SAEs)**: Used to extract latent features from the embeddings of PLM. - **Feature activation pattern analysis**: Identify structurally and conceptually interpretable features by analyzing the activation patterns of features on different proteins. - **Automatic annotation**: Use large - language models (such as Claude - 3.5 Sonnet) to automatically annotate latent features. - **Quantitative evaluation**: Evaluate the interpretability and accuracy of features by comparing with known Swiss - Prot concept annotations. ### Key findings: - **More interpretable features**: Compared with the original neurons of ESM - 2, SAE features can capture more specific biological concepts, and the number is significantly increased. - **Feature clustering**: Through cluster analysis, groups of features with similar functional and structural roles are found, revealing the natural hierarchical structure in the model embedding space. - **Automatic description generation**: Large - language models can generate meaningful feature descriptions that are highly correlated with the actual feature activation values. In conclusion, through a systematic approach, this paper not only improves the understanding of the internal mechanisms of PLMs but also provides new tools and resources for protein research, which is helpful for future biomedical research and applications.

InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders

Open-Source Protein Language Models for Function Prediction and Protein Design

PLM-interact: extending protein language models to predict protein-protein interactions

S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure

Learning immune receptor representations with protein language models

Structure-Infused Protein Language Models

Long-context Protein Language Model

Efficient Inference, Training, and Fine-tuning of Protein Language Models

From PSSM to Pre-Trained Language Models

The Protein Language Visualizer: Sequence Similarity Networks for the Era of Language Models

Exploring evolution-aware & -free protein language models as protein function predictors

InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions

Does protein pretrained language model facilitate the prediction of protein–ligand interaction?

Interpretable improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein

Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction

From a single sequence to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2 protein sequences

Protein language models meet reduced amino acid alphabets

Exploring Latent Space for Generating Peptide Analogs Using Protein Language Models

Pre-trained Protein Language Model Sheds New Light on the Prediction of Arabidopsis Protein–protein Interactions

Do Protein Language Models Learn Phylogeny?

Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning