Identifying Erroneous Software Changes through Self-Supervised Contrastive Learning on Time Series Data

Xuanrun Wang,Kanglin Yin,Qianyu Ouyang,Xidao Wen,Shenglin Zhang,Wenchi Zhang,Li Cao,Jiuxue Han,Xing Jin,Dan Pei
DOI: https://doi.org/10.1109/ISSRE55969.2022.00043
2022-01-01
Abstract:Software changes are frequent and inevitable. However, erroneous software changes may cause failures and incidents, degrading user experience and system stability. Thus, it is critical to distinguish erroneous software changes from normal ones. Our empirical study from a global data center reveals that erroneous software changes have caused nearly one-third of the critical incidents in the last two years. Some quantitative results also imply that the number of software changes and that of the Key Performance Indicator (KPI) time series related to a software change are relatively large. Based on the observations, we propose Kontrast, a self-supervised, generic and adaptive approach using contrastive learning, aiming to identify erroneous software changes on time. Its key idea is to compare pre-change and post-change KPI time series related to the software change, assuring the time series is still in a normal state after the software change. Since contrastive learning approaches need a fully-labeled dataset, we propose a novel data augmentation technique inspired by self-supervised learning to generate data with pseudo labels. Our model significantly outperforms all the compared approaches on two datasets with a millisecond-level speed for each KPI and is proven to obtain cross-dataset adaptability. To better certify our contribution, we also exhibit some success cases of Kontrast from its deployment.
What problem does this paper attempt to address?