Abstract:Recent language models have demonstrated proficiency in summarizing source code. However, as in many other domains of machine learning, language models of code lack sufficient explainability. Informally, we lack a formulaic or intuitive understanding of what and how models learn from code. Explainability of language models can be partially provided if, as the models learn to produce higher-quality code summaries, they also align in deeming the same code parts important as those identified by human programmers. In this paper, we report negative results from our investigation of explainability of language models in code summarization through the lens of human comprehension. We measure human focus on code using eye-tracking metrics such as fixation counts and duration in code summarization tasks. To approximate language model focus, we employ a state-of-the-art model-agnostic, black-box, perturbation-based approach, SHAP (SHapley Additive exPlanations), to identify which code tokens influence that generation of summaries. Using these settings, we find no statistically significant relationship between language models' focus and human programmers' attention. Furthermore, alignment between model and human foci in this setting does not seem to dictate the quality of the LLM-generated summaries. Our study highlights an inability to align human focus with SHAP-based model focus measures. This result calls for future investigation of multiple open questions for explainable language models for code summarization and software engineering tasks in general, including the training mechanisms of language models for code, whether there is an alignment between human and model attention on code, whether human attention can improve the development of language models, and what other model focus measures are appropriate for improving explainability.

WheaCha: A Method for Explaining the Predictions of Models of Code

An Explanation Method for Models of Code

WheaCha: A Method for Explaining the Predictions of Code Summarization Models

Demystifying Code Summarization Models.

Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data

Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations

Explaining Language Models' Predictions with High-Impact Concepts

Towards a Deep and Unified Understanding of Deep Neural Models in NLP

Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?

A Unified Approach to Interpreting Model Predictions

Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions

The Weighted Möbius Score: A Unified Framework for Feature Attribution

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

Latent Concept-based Explanation of NLP Models

MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks

Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Provably Better Explanations with Optimized Aggregation of Feature Attributions

Generic Attention-model Explainability by Weighted Relevance Accumulation

On Attribution of Recurrent Neural Network Predictions via Additive Decomposition

Asymmetric feature interaction for interpreting model predictions