SeeMore: a Spatiotemporal Predictive Model with Bidirectional Distillation and Level-Specific Meta-Adaptation
Yuqing Ma,Wei Liu,Yajun Gao,Yang Yuan,Shihao Bai,Haotong Qin,Xianglong Liu
DOI: https://doi.org/10.1007/s11432-022-3859-8
2024-01-01
Science China Information Sciences
Abstract:Predicting future frames using historical spatiotemporal data sequences is challenging and critical, and it is receiving a lot of attention these days from academic and industrial scholars. Most spatiotemporal predictive algorithms ignore the valuable backward reasoning ability and the disparate learning complexities among different layers and hence, cannot build good long-term dependencies and spatial correlations,resulting in suboptimal solutions. To address the aforementioned issues, we propose a two-stage coarse-to-fine spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation(See More)in this paper, which includes a bidirectional distillation network(BDN) and a level-specific meta-adapter(LMA), to gain bidirectional multilevel reasoning. In the first stage, BDN concentrates on bidirectional dynamics modeling and coarsely constructs spatial correlations of different layers, while LMA is introduced in the second fine-tuning stage to refine the multilevel spatial correlations from a meta-learning perspective.In particular, BDN mimics the forward and backward reasoning abilities of humans in a distillation manner,which aids in the development of long-term dependencies. The LMA views learning of different layers as disparate but related tasks and guides the transfer of learning experiences among these tasks through learning complexities. Thus, each layer could be closer to its solutions and could extract more informative spatial correlations. By capturing the enhanced short-term spatial correlations and long-term temporal dependencies,the proposed model could extract adequate knowledge from sequential historical observations and accurately predict future frames whose backtracking preconditions are consistent with the historical sequence. Our work is general and robust enough to be integrated into most spatiotemporal predictive models without requiring additional computation or memory cost during inference. Extensive experiments on four widely used predictive learning benchmarks validated the proposed model's effectiveness in comparison to state-of-the-art approaches(e.g., 10.6% improvement of Mean Squared Error on the Moving MNIST dataset).