Zhenhao Li,Chuan Luo,Tse-Hsun Chen,Weiyi Shang,Shilin He,Qingwei Lin,Dongmei Zhang
Abstract:Due to the sheer size of software logs, developers rely on automated techniques for log analysis. One of the first and most important steps of automated log analysis is log abstraction, which parses the raw logs into a structured format. Prior log abstraction techniques aim to identify and abstract all the dynamic variables in logs and output a static log template for automated log analysis. However, these abstracted dynamic variables may also contain important information that is useful to different tasks in log analysis. In this paper, we investigate the characteristics of dynamic variables and their importance in practice, and explore the potential of a variable-aware log abstraction technique. Through manual investigations and surveys with practitioners, we find that different categories of dynamic variables record various information that can be important depending on the given tasks, the distinction of dynamic variables in log abstraction can further assist in log analysis. We then propose a deep learning based log abstraction approach, named VALB, which can identify different categories of dynamic variables and preserve the value of specified categories of dynamic variables along with the log templates (i.e., variable-aware log abstraction). Through the evaluation on a widely used log abstraction benchmark, we find that VALB outperforms other state-of-the-art log abstraction techniques on general log abstraction (i.e., when abstracting all the dynamic variables) and also achieves a high variable-aware log abstraction accuracy that further identifies the category of the dynamic variables. Our study highlights the potential of leveraging the important information recorded in the dynamic variables to further improve the process of log analysis.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in software log analysis, the existing log abstraction techniques will ignore important information in dynamic variables when extracting static log templates. Specifically, traditional log abstraction methods aim to identify and abstract all dynamic variables in logs to generate static log templates for automated log analysis. However, these abstracted dynamic variables may contain valuable information useful for different tasks. Therefore, the author has studied the characteristics of dynamic variables and their importance in practical applications, and explored a log abstraction technique that can distinguish different types of dynamic variables, namely Variable - Aware Log Abstraction (VALB).
### Background of the Paper and Problem Definition
1. **Importance of Logs**: Logs record the running behavior of software systems and are crucial for tasks in software development and maintenance processes such as fault diagnosis, program understanding, and anomaly detection.
2. **Challenges in Log Analysis**: Modern software systems generate a large amount of log data every day, which makes log analysis face huge challenges. Therefore, developers usually rely on automated techniques for log analysis.
3. **Log Abstraction Techniques**: Log abstraction (or log parsing) is the first step in automated log analysis, aiming to convert raw logs into a structured format. Existing log abstraction techniques mainly generate static log templates by identifying and abstracting all dynamic variables.
### Limitations of Existing Technologies
1. **Information Loss**: Existing log abstraction techniques may lose important information when abstracting dynamic variables. For example, in anomaly detection, dynamic variables (such as execution time, error codes, etc.) may be very important for distinguishing normal and abnormal behaviors.
2. **Importance of Dynamic Variables**: Different dynamic variables record different types of information, and this information may have different importance in different tasks.
### Research Objectives
1. **Study the Characteristics and Importance of Dynamic Variables**: Through manual research and investigation, understand the characteristics of dynamic variables and their importance in practical applications.
2. **Propose Variable - Aware Log Abstraction (VALB)**: Develop a deep - learning - based log abstraction method that can identify different types of dynamic variables and retain the values of specific types of dynamic variables as needed.
### Main Contributions
1. **Research on Dynamic Variables**: Revealed the characteristics of dynamic variables and their importance in different tasks, and pointed out the need for a log abstraction technique that can distinguish dynamic variable types.
2. **VALB Method**: Proposed VALB, which is the first method that can further identify dynamic variable types during the log abstraction process. VALB has achieved significant performance improvements in both general log abstraction and variable - aware log abstraction.
3. **Improvement of Downstream Tasks**: Explored the potential of variable - aware log abstraction in assisting log - based downstream tasks (such as anomaly detection), and found that it can improve the performance of anomaly detection.
### Conclusion
This research reveals the importance of dynamic variables in log analysis and proposes a new log abstraction method VALB, which can better preserve and utilize the valuable information in dynamic variables, thereby improving the log analysis process.