A Multimodal Clustering Framework with Cross Reconstruction Autoencoders

Qianli Zhao,Linlin Zong,Xianchao Zhang,Yuangang Li,Xiaorui Tang
DOI: https://doi.org/10.1109/access.2020.3040644
IF: 3.9
2020-01-01
IEEE Access
Abstract:Multimodal clustering algorithms partitions a multimodal dataset into disjoint clusters. Common feature extraction is a key part in multimodal clustering algorithms. Recently, deep neural networks shows high performance on latent feature extraction. However, existing works did not completely explore the cross-model distribution similarity utilizing deep neural networks. We present a deep multimodal clustering framework with cross reconstruction. Feature extraction apply global cross reconstruction and local cross reconstruction respectively to enforce early fusion among different modalities. Analysis shows that the both cross reconstruction networks reduces the Wasserstein distance of latent feature distributions, which indicates that the proposed framework ensures the distribution similarity of common latent features. Experimental results on benchmark datasets demonstrate superiority beyond existing works.
What problem does this paper attempt to address?