Abstract:This paper proposes two linear projection methods for supervised dimension reduction using only the first and second-order statistics. The methods, each catering to a different parameter regime, are derived under the general Gaussian model by maximizing the Kullback-Leibler divergence between the two classes in the projected sample for a binary classification problem. They subsume existing linear projection approaches developed under simplifying assumptions of Gaussian distributions, such as these distributions might share an equal mean or covariance matrix. As a by-product, we establish that the multi-class linear discriminant analysis, a celebrated method for classification and supervised dimension reduction, is provably optimal for maximizing pairwise Kullback-Leibler divergence when the Gaussian populations share an identical covariance matrix. For the case when the Gaussian distributions share an equal mean, we establish conditions under which the optimal subspace remains invariant regardless of how the Kullback-Leibler divergence is defined, despite the asymmetry of the divergence measure itself. Such conditions encompass the classical case of signal plus noise, where both the signal and noise have zero mean and arbitrary covariance matrices. Experiments are conducted to validate the proposed solutions, demonstrate their superior performance over existing alternatives, and illustrate the procedure for selecting the appropriate linear projection solution.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve a key problem in Supervised Dimension Reduction (SDR), especially in binary classification problems, that is, how to find the optimal low - dimensional representation under the general Gaussian model. Specifically, the paper proposes two linear projection methods for supervised dimension reduction when only using first - order and second - order statistics. These methods are applicable to different parameter conditions respectively and are achieved by maximizing the Kullback - Leibler Divergence (KLD) of samples of the two classes after projection. #### Main problem description 1. **Challenges in supervised dimension reduction**: - In supervised learning, how to effectively use label information for dimension reduction is an important issue. Traditional PCA is an unsupervised method and does not consider label information. - For high - dimensional data, especially when the data dimension is much larger than the sample size, finding an effective low - dimensional representation is particularly important. 2. **Limitations of existing methods**: - Existing linear projection methods are usually developed under simplified assumptions, such as assuming that Gaussian distributions of different classes share the same mean or covariance matrix. - These assumptions limit the applicability and performance of the methods, especially when the actual data does not conform to these assumptions. 3. **Objectives**: - The objective of the paper is to propose a new linear projection method that can maximize KLD in a more general situation (that is, Gaussian distributions of different classes neither share the same mean nor the same covariance matrix). - Through this method, the discriminative information between classes can be better preserved in the low - dimensional space, thereby improving classification performance. #### Specific problems and solutions - **Problem**: How to find the optimal low - dimensional representation under the general Gaussian model so that the discriminative information of classes is maximized in the projected samples? - **Solutions**: 1. **Large - μ regime**: When the mean difference between classes is large, a method is proposed. By retaining the mean difference term \( D_\mu(p_1 \| p_2) \) and retaining as many covariance difference terms \( D_\Sigma(p_1 \| p_2) \) as possible, KLD is maximized. - Specific algorithm: First, find a direction \( a_1=\Sigma_2^{-1}(\mu_2 - \mu_1) \), and then select the generalized eigenvectors to maximize \( g(\lambda) \). 2. **Small - μ regime**: When the mean difference is small, a method using the additivity of KLD is proposed. The overall KLD is maximized by summing the KLD of independent observations. - Specific algorithm: By calculating the generalized eigen - decomposition of the covariance matrix pairs, select appropriate eigenvectors to construct the projection matrix. #### Conclusion Through these two methods, the paper not only solves the supervised dimension reduction problem under the general Gaussian model, but also provides theoretical support for multi - class classification problems, proving that multi - class Linear Discriminant Analysis (LDA) is optimal under specific conditions. In addition, the experimental results verify the effectiveness and superiority of the proposed methods. Formula summary: - KLD formula: \[ D(p_1 \| p_2)=\frac{1}{2}\left[\ln\left|\Sigma_2\right|-\ln\left|\Sigma_1\right|-d + \text{tr}(\Sigma_2^{-1}\Sigma_1)+(\mu_2 - \mu_1)^T\Sigma_2^{-1}(\mu_2 - \mu_1)\right] \] - Mean difference term \( D_\mu \) and covariance difference term \( D_\Sigma \): \[ D_\mu(p_1 \| p_2)=\frac{1}{2}(\mu_2 - \mu_1)^T\Sigma_2^{-1}(\mu_2 - \mu_1) \]

Divergence Maximizing Linear Projection for Supervised Dimension Reduction

Computational and Theoretical Analysis of Supervised Dimensionality Reduction

A Dimension Reduction Algorithm Based on Divergence Balance

Laplacian MinMax Discriminant Projection and Its Applications.

Local Feature Discriminant Projection

Divergent Projection Analysis for Unsupervised Dimensionality Reduction

Towards Robust Discriminative Projections Learning via Non-Greedy <i>l</i><sub>2,1</sub>-Norm MinMax

Collaborative Representation Based Discriminant Local Preserving Projection

Optimal projections for Gaussian discriminants

Supervised dimensionality reduction for big data

Discriminant Hyper-Laplacian Projections and Its Scalable Extension for Dimensionality Reduction

On unsupervised projections and second order signals

Communications Inspired Linear Discriminant Analysis

Simultaneously Learning Neighborship and Projection Matrix for Supervised Dimensionality Reduction

Two-Dimensional Discriminant Locality Preserving Projection Based on ℓ1-Norm Maximization

Neighborhood MinMax Projections

A Promising Nonlinear Dimensionality Reduction Method: Kernel-Based Within Class Collaborative Preserving Discriminant Projection

Linear Regression Based Projections for Dimensionality Reduction

Linear Dimensionality Reduction: Survey, Insights, and Generalizations

Infinite Bayesian Max-Margin Discriminant Projection.