Achieving High-Level Software Component Summarization via Hierarchical Chain-of-Thought Prompting and Static Code Analysis

Satrio Adi Rukmono,Lina Ochoa,M. Chaudron
DOI: https://doi.org/10.1109/ICoDSE59534.2023.10292037
2023-09-07
Abstract:Comprehension of software systems is key to their successful maintenance and evolution. This comprehension comes at different levels of abstraction: At the low level, one must focus on comprehending functions; while at the high level, one must abstract and comprehend the system's requirements. Diverse Automated Source Code Summarization (ASCS) techniques have been proposed to comprehend systems at the lower level. However, techniques for abstracting higher-level explanations fall short. Research on related fields, such as software architecture recovery, has tried to address system comprehension at the higher level by attempting to detect abstractions of design decisions from source code. Nevertheless, this is an on-going effort and many steps in the process are still unsolved. In this paper, we lever-age the emergent abilities of Large Language Models (LLMs) together with the achievements in the ASCS and static code analysis fields to design an approach that produces component-level summaries of software systems. Particularly, we address the unreliability of LLMs in performing reasoning by applying a chain-of-thought prompting strategy, which allows us to emulate inductive reasoning. We follow a bottom-up approach, where we start by comprehending lower-level software abstractions (e.g., functions), and then we compose these findings-in a cascading style-to comprehend higher-level ones (e.g., classes, components). We demonstrate the feasibility of our approach by applying it to the open-source Java project JHotDraw version 5.1. We believe our approach offers a stepping stone in developing robust automated software summarization approaches that can be applied generally across domains and types of software system.
Computer Science
What problem does this paper attempt to address?