Deep Optical Flow Learning with Deformable Large-Kernel Cross-Attention

Xuezhi Xiang,Yiming Chen,Denis Ombati,Lei Zhang,Xiantong Zhen
DOI: https://doi.org/10.1109/icip51287.2024.10647703
2024-01-01
Abstract:Optical flow estimation from image sequences is a fundamental problem in computer vision. In recent years, some methods have utilized Transformer to model global dependencies and improve optical flow, achieving impressive performance. However, in these methods, Transformers typically treat two-dimensional image features as one-dimensional sequences. While position encoding partially mitigates the loss of position information between different feature patches, Transformer still lacks inherent biases for modeling local visual patterns and tend to overlook channel characteristics in image features. Therefore, this paper introduces a deformable large kernel attention module, combining the strengths of convolution and attention mechanisms, which can preserve feature channel adaptability while modeling global dependencies without compromising the two-dimensional structure of features, significantly enhancing optical flow estimation. Additionally, the introduced deformable mechanism allows the model to adapt appropriately to different data patterns. Experimental results demonstrate that our optical flow estimation method achieves competitive results on publicly available benchmarks such as Sintel and KITTI.
What problem does this paper attempt to address?