GCA-Net: A Global Context Aggregation Network for Effective Optical Flow

Tao Xie,Jinghan Gao,Ke Wang,Ruifeng Li
DOI: https://doi.org/10.1109/iecon51785.2023.10311735
2023-01-01
Abstract:Optical flow seeks to estimate the per-pixel 2D motion between two frames by identifying corresponding pixels. Current flow estimators typically involve per-pixel feature extraction, multi-scale 4D correlation volume construction, and iterative flow field updates through a Conv-GRU module. However, the locality of convolutional features in these methods renders the calculated correlations vulnerable to different noises. In addition, the Conv-GRU module of these methods is only executed with a convolution layer, which is incapable of exploiting context clues from larger window sizes even inside the image being queried itself, thus raising it more difficult for the model to process images with challenging regions, e.g., textureless areas. To this end, we introduce GCA-Net, a global context aggregation network for credible yet effective optical flow estimation. More specifically, we propose a highly efficient multi-scale transformer (MSFormer) layer which enables the per-pixel feature to aggregate long-range information from other features, hence building more accurate 4D correlation volumes. Besides, we develop an attention-enhanced Conv-GRU block (AttGRU) that can incorporate information alongside a larger context window even within itself empowering our network to estimate optical flow in even the most challenging regions. Experimentally, we demonstrate that GCA-Net outperforms previous state-of-the-art methods by large margins on Sintel (Final) and KITTI 2015 (background and foreground) benchmarks.
What problem does this paper attempt to address?