Cross-Modal Information Recovery and Enhancement Using Multiple-Input–Multiple-Output Variational Autoencoder

Jessica E. Liang
DOI: https://doi.org/10.1109/jiot.2024.3396401
IF: 10.6
2024-07-27
IEEE Internet of Things Journal
Abstract:Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input–multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with two modalities, our MIMO VAE consists of two encoders and two decoders. We use human brain cross-modal information fusion mechanism to integrate different modality signals in the MIMO VAE. To simplify the computational complexity of the MIMO VAE, we propose a linearization of the encoders using a compression matrix. Space and time complexity of the proposed MIMO VAE are analyzed. Theoretical proof shows that MIMO VAE could achieve lossless performance subject to certain conditions. Simulation results show that our linearized encoder VAE (LE-VAE) performs much better than the current VAE with Kullback–Leibler (KL) divergence (KL-VAE), and illustrate that the MIMO VAE can successfully perform visual and audio information recovery and enhancement. Our weighted approach for visual and audio enhancement performs better than the unweighted approach. The MIMO VAE could be applied to multimodal Internet of Things and other systems.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?