DMDC: a cross-attention network for dynamic mask-based dual-camera snapshot hyperspectral Photography

Zeyu Cai,Ziyu Zhang,Chengqian Jin,Feipeng Da
DOI: https://doi.org/10.1007/s00371-024-03700-z
IF: 2.835
2024-11-09
The Visual Computer
Abstract:Spectral images can enrich the material information of the reconstructed scene and have an essential role in computer visualization. Coded aperture snapshot spectral imaging (CASSI) for dynamic scenes still faces two problems in reconstruction: 1) The single input limits the network's performance. 2) The spatial light modulator's (SLM) performance has yet to fully develop due to the limitation of fixed mask coding. This paper proposes a cross-attention-based dual-stream network for a dual-camera CASSI system. We argue that RGB images and CASSI measurements are projections of spectral 3D cubes in different 2D spaces and that fusing the spatial features of RGB and the spectral features of CASSI improves the quality of the reconstruction. Upon that, we embed a dynamic mask module in front of the cross-attention-based dual-stream network to further improve the reconstruction quality of the system. Specifically, the dynamic mask module utilizes RGB images to pre-learn the spatial feature distribution of the scene. Then the dynamic mask module guides the SLM in encoding the CASSI. Finally, the RGB and CASSI images are reconstructed using a cross-attention-based dual-stream network to obtain high-quality reconstruction results. Comprehensive experiments on various datasets demonstrate the superior performance of our method. At similar speeds, our method provides a 4.0 dB improvement over existing SOTA methods on clean and noisy datasets. In the snapshot video imaging task, the single snapshot imaging time of DMDC-1stg is less than 50 ms, which verifies the feasibility of our method. (The code has been released at https://github.com/caizeyu1992/DMDC.)
computer science, software engineering
What problem does this paper attempt to address?