Abstract:Most recent works on optical flow use convex upsampling as the last step to obtain high-resolution flow. In this work, we show and discuss several issues and limitations of this currently widely adopted convex upsampling approach. We propose a series of changes, in an attempt to resolve current issues. First, we propose to decouple the weights for the final convex upsampler, making it easier to find the correct convex combination. For the same reason, we also provide extra contextual features to the convex upsampler. Then, we increase the convex mask size by using an attention-based alternative convex upsampler; Transformers for Convex Upsampling. This upsampler is based on the observation that convex upsampling can be reformulated as attention, and we propose to use local attention masks as a drop-in replacement for convex masks to increase the mask size. We provide empirical evidence that a larger mask size increases the likelihood of the existence of the convex combination. Lastly, we propose an alternative training scheme to remove bilinear interpolation artifacts from the model output. Our proposed ideas could theoretically be applied to almost every current state-of-the-art optical flow architecture. On the FlyingChairs + FlyingThings3D training setting we reduce the Sintel Clean training end-point-error of RAFT from 1.42 to 1.26, GMA from 1.31 to 1.18, and that of FlowFormer from 0.94 to 0.90, by solely adapting the convex upsampler.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the limitations and problems in current optical flow up - sampling methods, especially the widely - used convex up - sampling method. Specifically: 1. **Problems with Existing Convex Upsampling Methods**: - Existing convex up - sampling methods have some problems when dealing with high - resolution optical flow, such as being unable to preserve detailed information well. - The convex up - sampling method depends on the convex combination of low - resolution neighboring pixels, which limits its performance, especially when dealing with complex scenes and edges. 2. **Improvement Goals**: - The author proposes a series of improvement measures, aiming to improve the accuracy and effect of optical flow up - sampling, especially the performance in high - detail areas. - By introducing the local attention mechanism, re - define the convex up - sampling process to increase the mask size and better capture context information. - Propose a hierarchical up - sampling method, gradually magnifying the low - resolution optical flow to full resolution instead of magnifying it 8 times at once. - Explore different training schemes to reduce the artifacts caused by bilinear interpolation, thereby improving the model performance. 3. **Specific Improvement Measures**: - **Decouple Weights**: Decouple the weights for the final convex up - sampler to make it easier to find the correct convex combination. - **Add Context Features**: Provide additional context features for the convex up - sampler to enhance its performance. - **Use Local Attention Mechanism**: Reformulate convex up - sampling as a local attention mechanism, using the Transformer model to replace the traditional convex mask, thereby increasing the mask size. - **Hierarchical Upsampling**: Adopt a step - by - step up - sampling method, gradually magnifying the optical flow map, while combining input image features at different scales to better align object edges. - **Improve Training Scheme**: Explore training schemes for removing bilinear interpolation artifacts to improve the model's generalization ability. 4. **Experimental Results**: - On the FlyingChairs + FlyingThings3D training set, the author's method significantly reduces the end - point - error on the Sintel Clean dataset. For example, RAFT is reduced from 1.42 to 1.26, GMA from 1.31 to 1.18, and FlowFormer from 0.94 to 0.90. - Experiments also show that a larger mask size can provide more possible convex combinations, thus improving performance. In summary, this paper aims to improve the existing optical flow up - sampling techniques by introducing methods such as the local attention mechanism and hierarchical up - sampling, in order to better handle high - detail areas and improve overall performance.

Local Attention Transformers for High-Detail Optical Flow Upsampling

Optical Flow as Spatial-Temporal Attention Learners

Learning By Analogy: Reliable Supervision From Transformations For Unsupervised Optical Flow Estimation

TransFlow: Transformer as Flow Learner

HMAFlow: Learning More Accurate Optical Flow via Hierarchical Motion Field Alignment

GMFlow: Learning Optical Flow Via Global Matching

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

Self-Attention-Based Multiscale Feature Learning Optical Flow with Occlusion Feature Map Prediction

Adaptive Fractional-Order Multi-Scale Optimization TV-L1 Optical Flow Algorithm

Evolution of transformer-based optical flow estimation techniques: a survey

Detail Preserving Residual Feature Pyramid Modules for Optical Flow

LLA-FLOW: A Lightweight Local Aggregation on Cost Volume for Optical Flow Estimation

Flow-Guided Transformer for Video Inpainting

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

MRDFlow: Unsupervised Optical Flow Estimation Network With Multi-Scale Recurrent Decoder

Learning to Estimate Optical Flow Using Dual-Frequency Paradigm

Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow

Flowformer: Linearizing Transformers with Conservation Flows

Rethinking Optical Flow from Geometric Matching Consistent Perspective

Trustworthy Self-Attention: Enabling the Network to Focus Only on the Most Relevant References

UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model