Local Attention Transformers for High-Detail Optical Flow Upsampling

Alexander Gielisse,Nergis Tömen,Jan van Gemert
2024-12-09
Abstract:Most recent works on optical flow use convex upsampling as the last step to obtain high-resolution flow. In this work, we show and discuss several issues and limitations of this currently widely adopted convex upsampling approach. We propose a series of changes, in an attempt to resolve current issues. First, we propose to decouple the weights for the final convex upsampler, making it easier to find the correct convex combination. For the same reason, we also provide extra contextual features to the convex upsampler. Then, we increase the convex mask size by using an attention-based alternative convex upsampler; Transformers for Convex Upsampling. This upsampler is based on the observation that convex upsampling can be reformulated as attention, and we propose to use local attention masks as a drop-in replacement for convex masks to increase the mask size. We provide empirical evidence that a larger mask size increases the likelihood of the existence of the convex combination. Lastly, we propose an alternative training scheme to remove bilinear interpolation artifacts from the model output. Our proposed ideas could theoretically be applied to almost every current state-of-the-art optical flow architecture. On the FlyingChairs + FlyingThings3D training setting we reduce the Sintel Clean training end-point-error of RAFT from 1.42 to 1.26, GMA from 1.31 to 1.18, and that of FlowFormer from 0.94 to 0.90, by solely adapting the convex upsampler.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the limitations and problems in current optical flow up - sampling methods, especially the widely - used convex up - sampling method. Specifically: 1. **Problems with Existing Convex Upsampling Methods**: - Existing convex up - sampling methods have some problems when dealing with high - resolution optical flow, such as being unable to preserve detailed information well. - The convex up - sampling method depends on the convex combination of low - resolution neighboring pixels, which limits its performance, especially when dealing with complex scenes and edges. 2. **Improvement Goals**: - The author proposes a series of improvement measures, aiming to improve the accuracy and effect of optical flow up - sampling, especially the performance in high - detail areas. - By introducing the local attention mechanism, re - define the convex up - sampling process to increase the mask size and better capture context information. - Propose a hierarchical up - sampling method, gradually magnifying the low - resolution optical flow to full resolution instead of magnifying it 8 times at once. - Explore different training schemes to reduce the artifacts caused by bilinear interpolation, thereby improving the model performance. 3. **Specific Improvement Measures**: - **Decouple Weights**: Decouple the weights for the final convex up - sampler to make it easier to find the correct convex combination. - **Add Context Features**: Provide additional context features for the convex up - sampler to enhance its performance. - **Use Local Attention Mechanism**: Reformulate convex up - sampling as a local attention mechanism, using the Transformer model to replace the traditional convex mask, thereby increasing the mask size. - **Hierarchical Upsampling**: Adopt a step - by - step up - sampling method, gradually magnifying the optical flow map, while combining input image features at different scales to better align object edges. - **Improve Training Scheme**: Explore training schemes for removing bilinear interpolation artifacts to improve the model's generalization ability. 4. **Experimental Results**: - On the FlyingChairs + FlyingThings3D training set, the author's method significantly reduces the end - point - error on the Sintel Clean dataset. For example, RAFT is reduced from 1.42 to 1.26, GMA from 1.31 to 1.18, and FlowFormer from 0.94 to 0.90. - Experiments also show that a larger mask size can provide more possible convex combinations, thus improving performance. In summary, this paper aims to improve the existing optical flow up - sampling techniques by introducing methods such as the local attention mechanism and hierarchical up - sampling, in order to better handle high - detail areas and improve overall performance.