Exploring structural components in autoencoder-based data clustering
Sujoy Chatterjee,Suvra Jyoti Choudhury
DOI: https://doi.org/10.1016/j.engappai.2024.109562
IF: 8
2024-11-28
Engineering Applications of Artificial Intelligence
Abstract:Clustering is a fundamental machine-learning task that has received extensive popularity in the literature. The foundational tenet of traditional clustering approaches is that data are learned to be vectorized features through various representational learning techniques. The conventional clustering methods can no longer manage the high-dimensional data as the data gets more intricate. Numerous representational learning strategies using deep architectures have been presented over the years, particularly deep unsupervised learning due to its superiority over conventional approaches. In most existing research, especially in the autoencoder-based approaches, only the distance information of pair-of-points in the original data space is retained in the latent space. However, combining this with additional preserving factors like the variance and independent component in the original data and latent space, respectively, is important. In addition, the model's stability under noisy conditions is crucial. This paper provides a unique method for clustering data that combines autoencoder (AE), principal component analysis (PCA), and independent component analysis (ICA) to capture a relevant latent space representation. A further aid in lowering the dimensionality to improve clustering effectiveness is employing two-dimensional reduction algorithms, i.e., PCA and t− distributed stochastic neighbor embedding ( t− SNE). The proposed technique produces more precise and reliable clustering by utilizing the advantages of both approaches. To compare the efficiency of the proposed methods to conventional clustering methods and stand-alone autoencoders, we conduct comprehensive experiments on 13 real-life datasets. The outcomes demonstrate the approach's intriguing potential for addressing complicated clustering problems, and importantly, effectiveness is demonstrated even under noisy conditions.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary