CVIformer: Cross-View Interactive Transformer for Efficient Stereoscopic Image Super-Resolution

Dongyang Zhang,Shuang Liang,Tao He,Jie Shao,Ke Qin
DOI: https://doi.org/10.1109/tetci.2024.3436904
2024-01-01
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract:Inspired by the great success of the Transformer in computer vision, some works have started to explore the use of the Transformer for super-resolution (SR). However, with regard to stereoscopic SR, which aims to recover details from input pairs, how to efficiently integrate cross-view interactions into the Transformer architecture is still an ongoing development. Additionally, most existing stereoscopic SR methods only adopt a parallax mechanism in the middle of the network, and another issue is that the feature correlation from different viewpoints inevitably weakens as the network depth increases. To address these issues, we first utilize an efficient residual transformer block (ERTB) as the backbone for long-range intra-view feature extraction. Subsequently, we propose a novel multi-Dconv cross attentive block (MCAB) to enhance the cross-view interactions at the rear part of the Transformer architecture. Notably, the proposed MCAB promotes feature fusion from two viewpoints by employing bidirectional cross-attention, as opposed to an unidirectional flow from left to right or vice versa. This approach results in an efficient cross-view interaction from both branches. By leveraging the advantages of the proposed ERTB and MCAB, we introduce an efficient cross-view interaction Transformer (CVIformer) for stereoscopic SR. This architecture is capable of incorporating long-range intra-view and cross-view information with an acceptable computational overhead. Without excessive complexity, extensive experiments conducted on four public datasets demonstrate that our model achieves state-of-the-art results using only 1.17 million parameters, with approximately a 40% reduction in parameters compared to leading methods like iPASSR.
What problem does this paper attempt to address?