Abstract:Video snapshot compressive imaging (SCI) uses a low-speed 2D detector to capture high-speed scene, where the dynamic scene is modulated by different masks and then compressed into a snapshot measurement. Following this, a reconstruction algorithm is needed to reconstruct the high-speed video frames. Although state-of-the-art (SOTA) deep learning-based reconstruction algorithms have achieved impressive results, they still face the following challenges due to excessive model complexity and GPU memory limitations: (1) These models need high computational cost, and (2) They are usually unable to reconstruct large-scale video frames at high compression ratios. To address these issues, we develop an efficient network for video SCI by using hierarchical residual-like connections and hybrid CNN-Transformer structure within a single residual block, dubbed EfficientSCI++ . The EfficientSCI++ network can well explore spatial-temporal correlation using convolution in the spatial domain and Transformer in the temporal domain , respectively. We are the first time to demonstrate that a UHD color video ( ) with high compression ratio (40) can be reconstructed from a snapshot 2D measurement using a single end-to-end deep learning model with PSNR above 34 dB. Moreover, a mixed-precision model is trained to further accelerate the video SCI reconstruction process and save memory footprint. Extensive results on both simulation and real data demonstrate that, compared with precious SOTA methods, our proposed EfficientSCI++ and EfficientSCI can achieve comparable reconstruction quality with much cheaper computational cost and better real-time performance. Code is available at https://github.com/mcao92/EfficientSCI-plus-plus.

Generative Memorize-Then-Recall framework for low bit-rate Surveillance Video Compression

Foreground-Background Parallel Compression with Residual Encoding for Surveillance Video

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Memory-Efficient Network for Large-scale Video Compressive Sensing

Instance Segmentation Based Background Reference Frame Generation for Surveillance Video Coding

Intelligent Analysis Oriented Surveillance Video Coding.

A Compression and Recognition Joint Model for Structured Video Surveillance Storage

Enhanced Surveillance Video Compression with Dual Reference Frames Generation

Spatiotemporal Generative Adversarial Network-Based Dynamic Texture Synthesis for Surveillance Video Coding

Key Frames Assisted Hybrid Encoding for High-Quality Compressive Video Sensing

An Efficient Background Reconstruction Based Coding Method for Surveillance Videos Captured by Moving Camera

Memory-augmented Dense Predictive Coding for Video Representation Learning

Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation

EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging

Video object segmentation via couple streams and feature memory

Temporal context video compression with flow-guided feature prediction

Compressed Video Sensing Based on Deep Generative Adversarial Network

Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

Low-complexity and High-Efficiency Background Modeling for Surveillance Video Coding.

Learning Video Object Segmentation with Visual Memory

Predictive Coding For Animation-Based Video Compression