MC-Net: Integrating Multi-level Geometric Context for Two-view Correspondence Learning

Zizhuo Li,Chunbao Su,Fan Fan,Jun Huang,Jiayi Ma
DOI: https://doi.org/10.1109/tcsvt.2024.3374772
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In two-view correspondence learning, prevalent multi-layer perceptron (MLP)-based methods struggle with context capturing. To remedy this issue, recent advances innovatively stack convolutional neural network (CNN)-based Resblocks sequentially, showing an inherent proficiency in local context extraction. Yet, such non-issue-specific designs inherit the drawback of CNN’s difficulty in aggregating global context, leading to performance bottlenecks. To address this problem, this prospective study further explores the potential of the CNN-based framework and proposes MC-Net, a top-performing network that integrates both local and global context elegantly and seamlessly. Specifically, considering that sparse motion vectors and a dense motion field can be converted into each other through interpolation and sampling, we first transform unordered matches into image-structured data by estimating the dense motion field implicitly. Then, we design a hierarchical rectifying module to rectify the error of each ordered motion vector with CNN at multiple levels, enabling MC-Net to perceive global context from coarse-level features and local context from fine-level features simultaneously, which facilitates to tackle the discontinuities of the motion field in case of large scene disparity. Finally, we reconstruct comprehensive context-embedded features from rectified motion fields at all levels. Also, instead of using the residuals between rectified and pre-rectified motion vectors at the same layer to reject outliers as in previous studies, which seriously affects the inlier prediction accuracy, we rethink this operation meticulously and modify it to the difference between motion vectors obtained from each layer’s reconstruction and ones from the first layer before transformation, ensuring purer residuals and enhancing the matching performance without extra computational burden. Extensive experiments show that MC-Net outperforms state-of-the-arts on multiple domains and datasets.
engineering, electrical & electronic
What problem does this paper attempt to address?