Abstract:Moire patterns arise when two similar repetitive patterns interfere, a phenomenon frequently observed during the capture of images or videos on screens. The color, shape, and location of moire patterns may differ across video frames, posing a challenge in learning information from adjacent frames and preserving temporal consistency. Previous video demoireing methods heavily rely on well-designed alignment modules, resulting in substantial computational burdens. Recently, Mamba, an improved version of the State Space Model (SSM), has demonstrated significant potential for modeling long-range dependencies with linear complexity, enabling efficient temporal modeling in video demoireing without requiring a specific alignment module. In this paper, we propose a novel alignment-free Raw video demoireing network with frequency-assisted spatio-temporal Mamba (DemMamba). The Spatial Mamba Block (SMB) and Temporal Mamba Block (TMB) are sequentially arranged to facilitate effective intra- and inter-relationship modeling in Raw videos with moire patterns. Within SMB, an Adaptive Frequency Block (AFB) is introduced to aid demoireing in the frequency domain. For TMB, a Channel Attention Block (CAB) is embedded to further enhance temporal information interactions by exploiting the inter-channel relationships among features. Extensive experiments demonstrate that our proposed DemMamba surpasses state-of-the-art approaches by 1.3 dB and delivers a superior visual experience.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is video demoiréing, especially for the moiré phenomenon in Raw videos. Moiré is caused by the interference between two similar repeating patterns, and this phenomenon often occurs when capturing images or videos on the screen. The color, shape, and position of moiré may vary in different video frames, which makes it challenging to learn information from adjacent frames and maintain temporal consistency. Existing video demoiréing methods usually rely on elaborately - designed alignment modules, which will lead to a significant computational burden, especially when dealing with high - resolution long - video sequences. To solve these problems, this paper proposes a new alignment - free Raw video demoiréing network, called DemMamba, which utilizes the frequency - assisted spatio - temporal Mamba model. Specifically, DemMamba solves the problem in the following ways: 1. **Introducing Spatial Mamba Block (SMB) and Temporal Mamba Block (TMB)**: These two blocks are arranged in sequence to effectively model the internal and external relationships of moiré in Raw videos. 2. **Adaptive Frequency Block (AFB)**: Introduce AFB in SMB to help with demoiréing in the frequency domain. 3. **Channel Attention Block (CAB)**: Embed CAB in TMB to further enhance the temporal information interaction by exploiting the inter - channel relationships between features. Through these designs, DemMamba can not only effectively remove moiré but also maintain the temporal consistency of the video, and has higher efficiency and better visual effects compared to existing methods. ### Formula Representation The formulas involved in the paper mainly include the discretization process of the State Space Model (SSM), which is as follows: The linear ordinary differential equation (ODE) of a continuous - time linear time - invariant (LTI) system can be represented as: \[ h'(t)=Ah(t)+Bx(t), \] \[ y(t)=Ch(t)+Dx(t), \] where \( N \) represents the state size, \( A\in\mathbb{R}^{N\times N} \), \( B\in\mathbb{R}^{N\times1} \), \( C\in\mathbb{R}^{1\times N} \), \( D\in\mathbb{R} \). The discretization process adopts the zero - order - hold (ZOH) rule, and the formulas are: \[ A = \exp(\Delta A), \] \[ B = (\Delta A)^{-1}(\exp(A)-I)\cdot\Delta B. \] The discretized recurrent neural network (RNN) form is: \[ h_k = Ah_{k - 1}+Bx_k, \] \[ y_k = Ch_k+Dx_k. \] The convolutional neural network (CNN) form is: \[ K\triangleq(CB,CAB,\ldots,CA^{L - 1}B), \] \[ y = x\circledast K, \] where \( L \) represents the length of the input sequence, \( \circledast \) represents the convolution operation, and \( K\in\mathbb{R}^L \) represents the structured convolution kernel. ### Summary This paper aims to develop an efficient, alignment - free Raw video demoiréing method. By introducing the frequency - assisted spatio - temporal Mamba model, it solves the problems of high computational complexity and poor temporal consistency in existing methods. The experimental results show that DemMamba outperforms existing methods in both quantitative and qualitative evaluations.

DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

Video Demoiréing with Deep Temporal Color Embedding and Video-Image Invertible Consistency

Direction-aware Video Demoireing with Temporal-guided Bilateral Learning

MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

VideoMamba: Spatio-Temporal Selective State Space Model

MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging

Recaptured Raw Screen Image and Video Demoiréing via Channel and Spatial Modulations

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

VideoMamba: State Space Model for Efficient Video Understanding

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution

MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation

Real-Time Image Demoireing on Mobile Devices.

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

RAWMamba: Unified sRGB-to-RAW De-rendering With State Space Model

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

ReMamber: Referring Image Segmentation with Mamba Twister

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition