Nonparametric Bayes Classification via Learning of Affine Subspaces

Abhishek Bhattacharya

DOI: https://doi.org/10.48550/arXiv.1301.0681

2013-01-04

Abstract:The goal of this presentation is to build an efficient non-parametric Bayes classifier in the presence of large numbers of predictors. When analyzing such data, parametric models are often too inflexible while non-parametric procedures tend to be non-robust because of insufficient data on these high dimensional spaces. When dealing with these types of data, it is often the case that most of the variability tends to lie along a few directions, or more generally along a much smaller dimensional subspace of the feature space. Hence a class of regression models is proposed that flexibly learn about this subspace while simultaneously performing dimension reduction in classification. This methodology, allows the cell probabilities to vary non-parametrically based on a few coordinates expressed as linear combinations of the predictors. Also, as opposed to many black-box methods for dimensionality reduction, the proposed model is appealing in having clearly interpretable and identifiable parameters which provide insight into which predictors are important in determining accurate classification boundaries. Gibbs sampling methods are developed for posterior computations. The estimated cell probabilities are theoretically shown to be consistent, and real data applications are included to support the findings.

Methodology

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to construct efficient non - parametric Bayesian classifiers in the presence of a large number of predictor variables. Specifically, the paper focuses on how to build classifiers under the following conditions: 1. **Handling a large number of predictor variables**: How to effectively classify when the data set contains a large number of predictor variables. 2. **Allowing different cell probabilities to vary non - parametrically**: The probabilities of different classes can vary non - parametrically based on a few coordinates (which are linear combinations of predictor variables). 3. **Interpretability of model parameters**: Model parameters should be clearly interpretable and be able to provide insights into which predictor variables play an important role in constructing accurate classification boundaries. 4. **Consistency of estimated cell probabilities**: The estimated cell probabilities are consistent in both the weak and strong senses. 5. **Data application to support results**: Verify the effectiveness of the model through the application of actual data sets. To achieve these goals, the paper proposes a non - parametric Bayesian classification method based on learning affine subspaces. This method can not only work effectively in high - dimensional data, but also provide insights into which predictor variables have a greater impact on the classification results.

Nonparametric Bayes Classification via Learning of Affine Subspaces

Efficient Bayesian High-Dimensional Classification via Random Projection with Application to Gene Expression Data

Bayesian High-dimensional Linear Regression with Sparse Projection-posterior

High-dimensional Grouped-regression using Bayesian Sparse Projection-posterior

Optimal Bayes Classifiers for Functional Data and Density Ratios

A Generic Ensemble Approach to Estimate Multidimensional Likelihood in Bayesian Classifier Learning.

Scalable Bayesian regression in high dimensions with multiple data sources

A Nonparametric Bayesian Technique for High-Dimensional Regression

Learning from a lot: Empirical Bayes in high-dimensional prediction settings

Bayesian Nonparametric Covariance Regression

DP-space: Bayesian Nonparametric Subspace Clustering with Small-variance Asymptotics.

High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression

Bayesian Inference with Posterior Regularization and Applications to Infinite Latent SVMs

A Naïve Bayes Regularized Logistic Regression Estimator for Low-dimensional Classification

Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

Bayes optimal learning in high-dimensional linear regression with network side information

A Fully Nonparametric Modelling Approach to Binary Regression

Asymptotically faster estimation of high‐dimensional additive models using subspace learning

On high-dimensional classification by sparse generalized Bayesian logistic regression