Vision Transformer-Based, High-Fidelity, Computer-Generated Holography

Zhenxing Dong,Chao Xu,Yaoqi Tang,Yuye Ling,Yan Li,Yikai Su
DOI: https://doi.org/10.1117/12.2648987
2023-01-01
Abstract:Current learning-based Computer-Generated Holography (CGH) algorithms often utilize Convolutional Neural Networks (CNN)-based architectures. However, the CNN-based non-iterative methods mostly underperform the State-Of-The-Art (SOTA) iterative algorithms such as Stochastic Gradient Descent (SGD) in terms of display quality. Inspired by the global attention mechanism of Vision Transformer (ViT), we propose a novel unsupervised autoencoder-based ViT for generating phase-only holograms. Specifically, for the encoding part, we use Uformer to generate the holograms. For the decoding part, we use the Angular Spectrum Method (ASM) instead of a learnable network to reconstruct the target images. To validate the effectiveness of the proposed method, numerical simulations and optical reconstructions are performed to compare our proposal against both iterative algorithms and CNN-based techniques. In the numerical simulations, the PSNR and SSIM of the proposed method are 26.78 dB and 0.832, which are 4.02 dB and 0.09 higher than that of the CNN-based method, respectively. Moreover, the proposed method contains less speckles and features a higher display quality than other CGH methods in experiments. We suggest the improvement might be ascribed to the ViT’s global attention mechanism, which is more suitable for learning the cross-domain mapping from image (spatial) domain to hologram (Fourier) domain. We believe the proposed ViT-based CGH algorithm could be a promising candidate for future real-time high-fidelity holographic displays.
What problem does this paper attempt to address?