Mamba-GIE: A Visual State Space Models-Based Generalized Image Extrapolation Method Via Dual-Level Adaptive Feature Fusion

Ruoyi Zhang,Guotao Li,Shuyi Qu,Jun Wang,Jinye Peng
DOI: https://doi.org/10.1016/j.eswa.2024.125961
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:Generalized Image Extrapolation is an image generation sub-task and a challenging ill-posed problem. This task intends to predict unknown regions based on the center area. Unfortunately, existing methods encounter the triple dilemma: (1) Convolutional Neural Networks (CNNs)-based methods can precisely extract local details but underperform in capturing global semantic information due to inductive bias, resulting in a lack of consistency in the image structure and layout. (2) Vision Transformer (ViT)-based methods, although superior to global information extraction, are not sufficiently fine-grained in detail and texture generation, and (3) ViT-based approaches rely on the self-attention mechanism, which leads to a tremendous computational burden in processing images and makes model training inefficient. We propose a novel model named Mamba-GIE, designed to effectively balance information of different granularities and address the unresolved challenges in GIE tasks. At the macro level, Mamba-GIE adopts a U-shaped encoder–decoder architecture, with its core basic block being the improved Hybrid State Space Models (Hybrid-SSMs). Specifically, within the basic blocks, the input feature map is processed via two parallel branches: (1) Extracting global information via the Mamba branch and (2) Handling local details using the CNNs branch. At the micro level, we introduce the dual-level adaptive feature fusion mechanism to achieve adaptive feature fusion in intra- and inter-Hybrid-SSMs blocks. Extensive experiments on three public datasets demonstrate that our approach outperforms existing GIE methods in most evaluation metrics and image generation quality. Comprehensive ablation studies and resource consumption assessments further reveal the efficiency and effectiveness of Mamba-GIE. Code: https://github.com/zrymsm/Mamba-GIE.
What problem does this paper attempt to address?