Divergence Maximizing Linear Projection for Supervised Dimension Reduction

Biao Chen,Joshua Kortje
2024-08-12
Abstract:This paper proposes two linear projection methods for supervised dimension reduction using only the first and second-order statistics. The methods, each catering to a different parameter regime, are derived under the general Gaussian model by maximizing the Kullback-Leibler divergence between the two classes in the projected sample for a binary classification problem. They subsume existing linear projection approaches developed under simplifying assumptions of Gaussian distributions, such as these distributions might share an equal mean or covariance matrix. As a by-product, we establish that the multi-class linear discriminant analysis, a celebrated method for classification and supervised dimension reduction, is provably optimal for maximizing pairwise Kullback-Leibler divergence when the Gaussian populations share an identical covariance matrix. For the case when the Gaussian distributions share an equal mean, we establish conditions under which the optimal subspace remains invariant regardless of how the Kullback-Leibler divergence is defined, despite the asymmetry of the divergence measure itself. Such conditions encompass the classical case of signal plus noise, where both the signal and noise have zero mean and arbitrary covariance matrices. Experiments are conducted to validate the proposed solutions, demonstrate their superior performance over existing alternatives, and illustrate the procedure for selecting the appropriate linear projection solution.
Information Theory
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve a key problem in Supervised Dimension Reduction (SDR), especially in binary classification problems, that is, how to find the optimal low - dimensional representation under the general Gaussian model. Specifically, the paper proposes two linear projection methods for supervised dimension reduction when only using first - order and second - order statistics. These methods are applicable to different parameter conditions respectively and are achieved by maximizing the Kullback - Leibler Divergence (KLD) of samples of the two classes after projection. #### Main problem description 1. **Challenges in supervised dimension reduction**: - In supervised learning, how to effectively use label information for dimension reduction is an important issue. Traditional PCA is an unsupervised method and does not consider label information. - For high - dimensional data, especially when the data dimension is much larger than the sample size, finding an effective low - dimensional representation is particularly important. 2. **Limitations of existing methods**: - Existing linear projection methods are usually developed under simplified assumptions, such as assuming that Gaussian distributions of different classes share the same mean or covariance matrix. - These assumptions limit the applicability and performance of the methods, especially when the actual data does not conform to these assumptions. 3. **Objectives**: - The objective of the paper is to propose a new linear projection method that can maximize KLD in a more general situation (that is, Gaussian distributions of different classes neither share the same mean nor the same covariance matrix). - Through this method, the discriminative information between classes can be better preserved in the low - dimensional space, thereby improving classification performance. #### Specific problems and solutions - **Problem**: How to find the optimal low - dimensional representation under the general Gaussian model so that the discriminative information of classes is maximized in the projected samples? - **Solutions**: 1. **Large - μ regime**: When the mean difference between classes is large, a method is proposed. By retaining the mean difference term \( D_\mu(p_1 \| p_2) \) and retaining as many covariance difference terms \( D_\Sigma(p_1 \| p_2) \) as possible, KLD is maximized. - Specific algorithm: First, find a direction \( a_1=\Sigma_2^{-1}(\mu_2 - \mu_1) \), and then select the generalized eigenvectors to maximize \( g(\lambda) \). 2. **Small - μ regime**: When the mean difference is small, a method using the additivity of KLD is proposed. The overall KLD is maximized by summing the KLD of independent observations. - Specific algorithm: By calculating the generalized eigen - decomposition of the covariance matrix pairs, select appropriate eigenvectors to construct the projection matrix. #### Conclusion Through these two methods, the paper not only solves the supervised dimension reduction problem under the general Gaussian model, but also provides theoretical support for multi - class classification problems, proving that multi - class Linear Discriminant Analysis (LDA) is optimal under specific conditions. In addition, the experimental results verify the effectiveness and superiority of the proposed methods. Formula summary: - KLD formula: \[ D(p_1 \| p_2)=\frac{1}{2}\left[\ln\left|\Sigma_2\right|-\ln\left|\Sigma_1\right|-d + \text{tr}(\Sigma_2^{-1}\Sigma_1)+(\mu_2 - \mu_1)^T\Sigma_2^{-1}(\mu_2 - \mu_1)\right] \] - Mean difference term \( D_\mu \) and covariance difference term \( D_\Sigma \): \[ D_\mu(p_1 \| p_2)=\frac{1}{2}(\mu_2 - \mu_1)^T\Sigma_2^{-1}(\mu_2 - \mu_1) \]