A Cloud-Based Triage Log Analysis and Recovery Framework.

Guanqiu Qi,Wei-Tek Tsai,Wu Li,Zhiqin Zhu,Yong Luo
DOI: https://doi.org/10.1016/j.simpat.2017.07.003
IF: 4.199
2017-01-01
Simulation Modelling Practice and Theory
Abstract:With the development of cloud infrastructure, more and more transaction processing systems are hosted in cloud platform. Log, that usually saves production behaviors of a transaction processing system in cloud, is widely used for triaging production failures. Log analysis of a cloud-based system faces challenges as the size of data increases, unstructured formats emerge, and untraceable failures occur more frequently. More requirements of log analysis are raised, such as real-time analysis, failure recovery, and so on. Existing solutions have their own focuses and cannot fulfill the increasing requirements. To address the main requirements and issues, this paper proposes a new log model that classifies and analyzes the interactions of services and the detailed logging information during workflow execution. A workflow analysis technique is used to fast triage production failures and assist failure recoveries. The failed workflow can be reconstructed from failures in real-time production servers by the proposed log analysis solution. The proposed solution is simulated by using a large size of log data and compared with traditional solution. The experimentation results prove the effectiveness and efficiency of proposed triage log analysis and recovery solution.
What problem does this paper attempt to address?