Active Learning for Low-Resource Project-Specific Code Summarization

Chengli Xing,Tianxiang Hu,Ninglin Liao,Minghui Zhang,Dongdong Du,Yupeng Wu,Qing Gao
DOI: https://doi.org/10.1007/978-981-97-5489-2_5
2024-01-01
Abstract:Code summarization aims to condense source code into concise and efficient summaries, crucial for enhancing code comprehension and maintainability. Deep learning and transfer learning techniques have significantly advanced state-of-the-art code summarization. However, constructing high-quality training datasets for these models remains challenging due to the high cost of manual annotation. Active learning methods offer a promising solution by intelligently selecting informative samples for annotation, reducing annotation costs while improving model performance. This paper explores the integration of active learning techniques into code summarization tasks, emphasizing the need to balance sample diversity and uncertainty in sample selection. Inspired by recent research, a novel active learning framework is proposed, leveraging both uncertainty and diversity metrics to construct effective training datasets. This framework provides a foundation for future research in project-specific code summarization.
What problem does this paper attempt to address?