Making Sense of Failure Logs in an Industrial DevOps Environment

Muhammad Abbas,Ali Hamayouni,Mahshid Helali Moghadam,Mehrdad Saadatmand,Per Erik Strandberg
DOI: https://doi.org/10.48550/arXiv.2301.03450
2023-01-09
Software Engineering
Abstract:Processing and reviewing nightly test execution failure logs for large industrial systems is a tedious activity. Furthermore, multiple failures might share one root/common cause during test execution sessions, and the review might therefore require redundant efforts. This paper presents the LogGrouper approach for automated grouping of failure logs to aid root/common cause analysis and for enabling the processing of each log group as a batch. LogGrouper uses state-of-art natural language processing and clustering approaches to achieve meaningful log grouping. The approach is evaluated in an industrial setting in both a qualitative and quantitative manner. Results show that LogGrouper produces good quality groupings in terms of our two evaluation metrics (Silhouette Coefficient and Calinski-Harabasz Index) for clustering quality. The qualitative evaluation shows that experts perceive the groups as useful, and the groups are seen as an initial pointer for root cause analysis and failure assignment.
What problem does this paper attempt to address?