A Tale of Two Comprehensions? Analyzing Student Programmer Attention during Code Summarization
Zachary Karas,Aakash Bansal,Yifan Zhang,Toby Li,Collin McMillan,Yu Huang
DOI: https://doi.org/10.1145/3664808
IF: 3.685
2024-05-15
ACM Transactions on Software Engineering and Methodology
Abstract:Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension, and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e., Writing). However, there is currently a gap studying programmers’ attention as they read code with pre-written summaries (i.e., Reading). As a result, it is currently unknown how these two forms of code comprehension compare: Reading and Writing. Also, there is a limited understanding of programmer attention with respect to program semantics. We address these shortcomings with a human eye-tracking study ( n =27) comparing Reading and Writing. We examined programmers’ attention with respect to fine-grained program semantics, including their attention sequences (i.e., scan paths). We find distinctions in programmer attention across the comprehension tasks, similarities in reading patterns between them, and differences mediated by demographic factors. This can help guide code comprehension in both CS education and automated code summarization. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree to explore another representation of human attention. We find that visual behavior on this structure is not always consistent with that on source code.
computer science, software engineering