Introduction to the Special Issue on Deep Learning for Multi-Modal Intelligence Across Speech, Language, Vision, and Heterogeneous Signals

Xiaodong He,Li Deng,Richard Rose,Minlie Huang,Isabel Trancoso,Chao Zhang
DOI: https://doi.org/10.1109/jstsp.2020.2989852
IF: 7.695
2020-01-01
IEEE Journal of Selected Topics in Signal Processing
Abstract:The ten papers included in this special section focus on deep learning for multi-modal intelligence across speech, language, vision, and heterogeneous signals. Thanks to the disruptive advances in deep learning, significant progress has been made in artificial intelligence (AI) applications with single modality, such as speech recognition, speech synthesis, image classification, object detection, as well as machine translation and reading comprehension, etc. However, many AI problems require more than one modality, and techniques developed for different modalities can often be successfully cross-fertilized. Therefore, the studies on the modeling and learning approaches across multiple modalities are of great interest. This special issue brings together a diverse but complementary set of contributions on emerging deep learning methods for problems based on multiple modalities including speech, text, image and video.
What problem does this paper attempt to address?