A Visual Analytics Framework for the Detection of Anomalous Call Stack Trees in High Performance Computing Applications

Cong Xie,Wei Xu,Klaus Mueller
DOI: https://doi.org/10.1109/tvcg.2018.2865026
IF: 5.2
2019-01-01
IEEE Transactions on Visualization and Computer Graphics
Abstract:Anomalous runtime behavior detection is one of the most important tasks for performance diagnosis in High Performance Computing (HPC). Most of the existing methods find anomalous executions based on the properties of individual functions, such as execution time. However, it is insufficient to identify abnormal behavior without taking into account the context of the executions, such as the invocations of children functions and the communications with other HPC nodes. We improve upon the existing anomaly detection approaches by utilizing the call stack structures of the executions, which record rich temporal and contextual information. With our call stack tree (CSTree) representation of the executions, we formulate the anomaly detection problem as finding anomalous tree structures in a call stack forest. The CSTrees are converted to vector representations using our proposed stack2vec embedding. Structural and temporal visualizations of CSTrees are provided to support users in the identification and verification of the anomalies during an active anomaly detection process. Three case studies of real-world HPC applications demonstrate the capabilities of our approach.
What problem does this paper attempt to address?