On Understanding of Spatiotemporal Prediction Model
Xu Huang,Xutao Li,Yunming Ye,Shanshan Feng,Chuyao Luo,Bowen Zhang
DOI: https://doi.org/10.1109/tcsvt.2022.3232889
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recently, explainable artificial intelligence has received considerable attention. Most existing studies are focusing on the tasks of CNNs-based image classification and RNNs-based time series analysis. In this paper, we pay attention to the more complicated spatiotemporal predictive learning task (SPLT), where both the spatial and temporal information play important roles. To explain the internal mechanism of spatiotemporal prediction models, we propose a comprehensive analysis method. Specifically, with a typical encoder-decoder framework, we focus on two core issues of SPLT: image generation and spatiotemporal dynamics. For the first issue, we develop a quantitative channel perturbation method to explore the importance of features to prediction. Furthermore, we propose a technique called the synthesis of multiple independent components to analyze how these features generate the prediction. According to the experimental results, the coarse- and fine-grained synthesis (CFGS) mechanism is drawn for image generation in SPLT. For the second issue, we propose a state decomposition technique and a state expansion technique to disentangle coupled signals in the spatiotemporal dynamical system. This helps us to explore the mechanism of forming motion. Moreover, to diagnose the movement of a particular region during analysis, we propose a fluorescent stamp-based technique. By observing extensive experimental results, we summarize a collaboration mechanism to explain how the motion is formed in SPLT, namely, the extending the present and erasing the past (EPEP) mechanism. To the best of our knowledge, this is the first work to interpret the internal mechanism of SPLT models.