Practitioners' Expectations on Log Anomaly Detection

Xiaoxue Ma,Yishu Li,Jacky Keung,Xiao Yu,Huiqi Zou,Zhen Yang,Federica Sarro,Earl T. Barr
2024-12-02
Abstract:Log anomaly detection has become a common practice for software engineers to analyze software system behavior. Despite significant research efforts in log anomaly detection over the past decade, it remains unclear what are practitioners' expectations on log anomaly detection and whether current research meets their needs. To fill this gap, we conduct an empirical study, surveying 312 practitioners from 36 countries about their expectations on log anomaly detection. In particular, we investigate various factors influencing practitioners' willingness to adopt log anomaly detection tools. We then perform a literature review on log anomaly detection, focusing on publications in premier venues from 2014 to 2024, to compare practitioners' needs with the current state of research. Based on this comparison, we highlight the directions for researchers to focus on to develop log anomaly detection techniques that better meet practitioners' expectations.
Software Engineering
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are: **To understand the actual expectations of software engineers for log anomaly detection, and whether current research meets these expectations**. Specifically, the author hopes to fill the following gaps through empirical research: 1. **What are the practitioners' expectations for log anomaly detection tools?** 2. **Does the current log anomaly detection research meet the needs of practitioners?** To answer these questions, the author carried out the following tasks: ### 1. Semi - structured interviews The author conducted semi - structured interviews with 15 professionals with an average of 8.07 years of software development / maintenance experience. Through these interviews, the author explored the problems encountered by practitioners when using log monitoring tools, their views on automated log anomaly detection tools and their expectations. ### 2. Large - scale survey The author designed and carried out an online survey involving 312 software practitioners from 36 countries. This survey aims to verify and expand the results found in the interviews and further understand the practitioners' expectations for log anomaly detection tools. ### 3. Literature review The author reviewed the log - anomaly - detection - related literature published in top - level conferences and journals from 2014 to 2024, and compared the gaps between the solutions proposed in these studies and the actual needs of practitioners. ### Main research questions and results #### RQ1: Current situation and problems of log monitoring tools - **Usage**: More than half (55.8%) of the respondents said that they use log monitoring tools for log anomaly detection, among which Elastic and Amazon CloudWatch Logs are the most commonly used options. - **Problems**: Approximately one - third of the users think that these tools cannot automatically identify log anomalies and still need manual analysis. In addition, many tools rely on keyword - based heuristic methods, and only a few tools adopt machine - learning or deep - learning techniques. #### RQ2: Importance of automated log anomaly detection tools - **Importance**: 95.5% of the respondents think that automated log anomaly detection tools are very important or worth using. Many practitioners believe that user - friendly automated tools can improve the effectiveness and efficiency of log anomaly detection and reduce manual workload. #### RQ3: Practitioners' expectations for automated log anomaly detection tools - **Detection granularity**: 70.5% of the practitioners prefer to conduct log - sequence - level analysis rather than single - log - event - level analysis. - **Evaluation metrics**: The evaluation metrics that practitioners care most about include recall rate (the ability to identify real anomalies), precision rate (whether the identified anomalies are real anomalies) and real - time detection efficiency. More than 70% of the respondents hope that the recall rate and precision rate are higher than 60%. - **Other expectations**: More than 78% of the practitioners hope that the tools can handle diverse log structures and provide explanations; more than half of the practitioners hope that the tools can identify anomalies within 5 seconds, and the installation, configuration and learning time do not exceed 1 hour. #### RQ4: Gaps between current research and practitioners' needs - **Utilization of data resources**: Although 83.7% and 74.9% of the practitioners said that historical - labeled log data, performance metrics and trace records are available, only 4 papers mentioned using these data. - **Detection granularity**: Only 6 papers clearly carried out log - event - level anomaly detection, which is the second - best choice for practitioners. - **Real - time detection**: More than half of the studies did not mention the time of log anomaly detection, although most practitioners emphasize the importance of real - time detection. - **Diversity handling**: Few studies discuss how to handle logs with different structures, provide explanations, customization or protect privacy, which are the focuses of practitioners. ### Summary This paper, through a mixed method (interviews, surveys and literature reviews), reveals the gaps between current log - anomaly - detection research and practitioners' needs, and proposes improvement directions to better meet the needs of practitioners.