Abstract:The problem of finding a reduced dimensionality representation of categorical variables while preserving their most relevant characteristics is fundamental for the analysis of complex data. Specifically, given a co-occurrence matrix of two variables, one often seeks a compact representation of one variable which preserves information about the other variable. We have recently introduced ``Sufficient Dimensionality Reduction' [GT-2003], a method that extracts continuous reduced dimensional features whose measurements (i.e., expectation values) capture maximal mutual information among the variables. However, such measurements often capture information that is irrelevant for a given task. Widely known examples are illumination conditions, which are irrelevant as features for face recognition, writing style which is irrelevant as a feature for content classification, and intonation which is irrelevant as a feature for speech recognition. Such irrelevance cannot be deduced apriori, since it depends on the details of the task, and is thus inherently ill defined in the purely unsupervised case. Separating relevant from irrelevant features can be achieved using additional side data that contains such irrelevant structures. This approach was taken in [CT-2002], extending the information bottleneck method, which uses clustering to compress the data. Here we use this side-information framework to identify features whose measurements are maximally informative for the original data set, but carry as little information as possible on a side data set. In statistical terms this can be understood as extracting statistics which are maximally sufficient for the original dataset, while simultaneously maximally ancillary for the side dataset. We formulate this tradeoff as a constrained optimization problem and characterize its solutions. We then derive a gradient descent algorithm for this problem, which is based on the Generalized Iterative Scaling method for finding maximum entropy distributions. The method is demonstrated on synthetic data, as well as on real face recognition datasets, and is shown to outperform standard methods such as oriented PCA.

A New Covariance Estimator for Sufficient Dimension Reduction in High-Dimensional and Undersized Sample Problems

Large-Dimensional Positive Definite Covariance Estimation for High Frequency Data via Low-rank and Sparse Matrix Decomposition

Minimum Covariance Determinant: Spectral Embedding and Subset Size Determination

A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization

A New Estimator for Efficient Dimension Reduction in Regression.

A Regularized High-Dimensional Positive Definite Covariance Estimator with High-Frequency Data

Learning Heterogeneity in Causal Inference Using Sufficient Dimension Reduction

Functional sufficient dimension reduction through distance covariance

Coupled regularized sample covariance matrix estimator for multiple classes

Minimum Covariance Determinant and Extensions

Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers

A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction

Shrinkage MMSE estimators of covariances beyond the zero-mean and stationary variance assumptions

On Estimating Regression-Based Causal Effects Using Sufficient Dimension Reduction

A new sufficient dimension reduction method via rank divergence

Sufficient Dimensionality Reduction with Irrelevant Statistics

Sufficient dimension reduction for average causal effect estimation

On the Effect of Suboptimal Estimation of Mutual Information in Feature Selection and Classification

On expectile-assisted inverse regression estimation for sufficient dimension reduction

Multiple multi-sample testing under arbitrary covariance dependency