Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction

Zhenhua Shi,Dongrui Wu,Jian Huang,Yu-Kai Wang,Chin-Teng Lin
DOI: https://doi.org/10.48550/arXiv.2001.03103
2020-01-12
Abstract:Dimensionality reduction is an important operation in information visualization, feature extraction, clustering, regression, and classification, especially for processing noisy high dimensional data. However, most existing approaches preserve either the global or the local structure of the data, but not both. Approaches that preserve only the global data structure, such as principal component analysis (PCA), are usually sensitive to outliers. Approaches that preserve only the local data structure, such as locality preserving projections, are usually unsupervised (and hence cannot use label information) and uses a fixed similarity graph. We propose a novel linear dimensionality reduction approach, supervised discriminative sparse PCA with adaptive neighbors (SDSPCAAN), to integrate neighborhood-free supervised discriminative sparse PCA and projected clustering with adaptive neighbors. As a result, both global and local data structures, as well as the label information, are used for better dimensionality reduction. Classification experiments on nine high-dimensional datasets validated the effectiveness and robustness of our proposed SDSPCAAN.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the challenge of maintaining both global and local data structures during the high - dimensional data dimension reduction process. Specifically: 1. **Limitations of existing methods**: - **Methods that only preserve global structure (such as PCA)**: These methods are usually sensitive to outliers because they mainly focus on the overall distribution of data. - **Methods that only preserve local structure (such as LPP)**: These methods are usually unsupervised, unable to utilize label information, and use a fixed similarity graph. 2. **The proposed new method**: - The paper proposes a new linear dimension - reduction method, called **Supervised Discriminative Sparse PCA with Adaptive Neighbors (SDSPCAAN)**. - This method combines **SDSPCA** (a supervised sparse PCA method) and **PCAN** (a projection clustering method based on adaptive neighbors), so that it can utilize global and local data structure information as well as label information during the dimension - reduction process. 3. **Objective**: - The proposed SDSPCAAN method aims to achieve more effective dimension - reduction by integrating global and local data structure information as well as label information, and especially shows better robustness and classification performance when dealing with high - dimensional noisy data. ### Formula summary - **SDSPCA optimization problem**: \[ \min_{W, G, Q} \|X - QW^T\|_F^2+\alpha\|Y - QG^T\|_F^2+\beta\|Q\|_{2,1} \] where \(G\in\mathbb{R}^{c\times k}\), and \(\alpha\) and \(\beta\) are scaling weights. - **PCAN optimization problem**: \[ \min_{W, F, S}\sum_{i,j = 1}^n(\|W^T x_i - W^T x_j\|_2^2S_{ij}+\gamma_iS_{ij}^2+\lambda\|f_i - f_j\|_2^2S_{ij}) \] - **SDSPCAAN optimization problem**: \[ \min_{Q, S}\|X - QQ^T X\|_F^2+\alpha\|Y - QQ^T Y\|_F^2+\beta\|Q\|_{2,1}+\frac{1}{2}\delta\left[2\text{Tr}(Q^T X X^T L X X^T Q)+\text{Tr}(S^T\Gamma S)+2\lambda\text{Tr}(Y^T L Y)\right] \] where \(\delta>0\) is a scaling weight. In this way, SDSPCAAN can effectively utilize global and local data structure information, thus showing excellent classification performance on a variety of high - dimensional data sets.