SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation

Huimin Huang,Shiao Xie,Lanfen Lin,Ruofeng Tong,Yen-Wei Chen,Yuexiang Li,Hong Wang,Yawen Huang,Yefeng Zheng
DOI: https://doi.org/10.1109/cvpr52729.2023.01091
2023-01-01
Abstract:Semi-supervised learning improves data efficiency of deep models by leveraging unlabeled samples to alleviate the reliance on a large set of labeled samples. These successes concentrate on the pixel-wise consistency by using convolutional neural networks (CNNs) but fail to address both global learning capability and class-level features for unlabeled data. Recent works raise a new trend that Trans- former achieves superior performance on the entire feature map in various tasks. In this paper, we unify the current dominant Mean-Teacher approaches by reconciling intra- model and inter-model properties for semi-supervised segmentation to produce a novel algorithm, SemiCVT, that absorbs the quintessence of CNNs and Transformer in a comprehensive way. Specifically, we first design a parallel CNN-Transformer architecture (CVT) with introducing an intra-model local-global interaction schema (LGI) in Fourier domain for full integration. The inter-model class- wise consistency is further presented to complement the class-level statistics of CNNs and Transformer in a cross- teaching manner. Extensive empirical evidence shows that SemiCVT yields consistent improvements over the state-of- the-art methods in two public benchmarks.
What problem does this paper attempt to address?