Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

Jian Shen,Zhong Li,Yifei Lu,Minxue Pan,Xuandong Li
DOI: https://doi.org/10.1007/s10515-024-00435-y
IF: 1.677
2024-04-05
Automated Software Engineering
Abstract:Deep predictive models have been widely employed in software engineering (SE) tasks due to their remarkable success in artificial intelligence (AI). Most of these models are trained in a supervised manner, and their performance heavily relies on the quality of training data. Unfortunately, mislabeling or label noise is a common issue in SE datasets, which can significantly affect the validity of models trained on such datasets. Although learning with noise approaches based on deep learning (DL) have been proposed to address the issue of mislabeling in AI datasets, the distinct characteristics of SE datasets in terms of size and data quality raise questions about the effectiveness of these approaches within the SE context. In this paper, we conduct a comprehensive study to understand how mislabeled samples exist in SE datasets, how they impact deep predictive models, and how well existing learning with noise approaches perform on SE datasets. Through an empirical evaluation on two representative datasets for the Bug Report Classification and Software Defect Prediction tasks, our study reveals that learning with noise approaches have the potential to handle mislabeled samples in SE tasks, but their effectiveness is not always consistent. Our research shows that it is crucial to address mislabeled samples in SE tasks. To achieve this, it is essential to take into account the specific properties of the dataset to develop effective solutions. We also highlight the importance of addressing potential class distribution changes caused by mislabeled samples and present the limitations of existing approaches for addressing mislabeled samples. Therefore, we urge the development of more advanced techniques to improve the effectiveness and reliability of deep predictive models in SE tasks.
computer science, software engineering
What problem does this paper attempt to address?