Transformer Autoencoder for K-means Efficient Clustering

Wenhao Wu,Weiwei Wang,Xixi Jia,Xiangchu Feng
DOI: https://doi.org/10.1016/j.engappai.2024.108612
IF: 8
2024-01-01
Engineering Applications of Artificial Intelligence
Abstract:As a fundamental unsupervised learning task, clustering has been widely applied in exploratory data analysis in the fields of computer vision, pattern recognition, and data mining. Among existing clustering methods, K-means is the most popular one due to its simplicity and computational efficiency. However, the ubiquitous high dimensionality challenges the effectiveness and the efficiency of the K-means algorithm. Fortunately, the deep neural network provides a powerful resolution for learning low dimensional feature. To optimize the feature learning and the K-means clustering jointly, we present a new deep clustering network called Transformer AutoEncoder for K-means Efficient clustering (TAKE). It consists of two modules: the Transformer AutoEncoder (TAE) for feature learning and the KNet for clustering. The TAE incorporates the transformer structure to learn global features and the contrastive learning mechanism to enhance feature discrimination. The KNet is constructed by unrolling the accelerated projected gradient descent iterations of the relaxed K-means model. The network is trained in two phases: pretraining and clustering. In pretraining, the TAE is optimized by minimizing the cosine similarity-based reconstruction loss, the contrastive loss (CL) and the convex combination loss (CCL). The CCL encourages features of augmented neighbor data to lie in a convex hull, thus K-means friendly. In the clustering phase, the TAE and the KNet are optimized jointly by minimizing the reconstruction loss and the K-means clustering loss. The clustering results are obtained by the forward inference of the KNet. Extended experiments show that our proposed method is highly effective in unsupervised representation learning and clustering.
What problem does this paper attempt to address?