Deep Self-Supervised T-Sne for Multi-modal Subspace Clustering.

Qianqian Wang,Wei Xia,Zhiqiang Tao,Quanxue Gao,Xiaochun Cao
DOI: https://doi.org/10.1145/3474085.3475319
2021-01-01
Abstract:Existing multi-modal subspace clustering methods, aiming to exploit the correlation information between different modalities, have achieved promising preliminary results. However, these methods might be incapable of handling real problems with complex heterogeneous structures between different modalities, since the large heterogeneous structure makes it difficult to directly learn a discriminative shared self-representation for multi-modal clustering. To tackle this problem, in this paper, we propose a deep Self-supervised t-SNE method (StSNE) for multi-modal subspace clustering, which learns soft label features by multi-modal encoders and utilizes the common label feature to supervise soft label feature of each modal by adversarial training and reconstruction networks. Specifically, the proposed StSNE consists of four components: 1) multi-modal convolutional encoders; 2) a self-supervised t-SNE module; 3) a self-expressive layer; 4) multi-modal convolutional decoders. Multi-modal data are fed to encoders to obtain soft label features, for which the self-supervised t-SNE module is added to make full use of the label information among different modalities. Simultaneously, the latent representations given by encoders are constrained by a self-expressive layer to capture the hierarchical information of each modal, followed by decoders reconstructing the encoded features to preserve the structure of the original data. Experimental results on several public datasets demonstrate the superior clustering performance of the proposed method over state-of-the-art methods.
What problem does this paper attempt to address?