Nonparametric Bayes Classification via Learning of Affine Subspaces

Abhishek Bhattacharya
DOI: https://doi.org/10.48550/arXiv.1301.0681
2013-01-04
Abstract:The goal of this presentation is to build an efficient non-parametric Bayes classifier in the presence of large numbers of predictors. When analyzing such data, parametric models are often too inflexible while non-parametric procedures tend to be non-robust because of insufficient data on these high dimensional spaces. When dealing with these types of data, it is often the case that most of the variability tends to lie along a few directions, or more generally along a much smaller dimensional subspace of the feature space. Hence a class of regression models is proposed that flexibly learn about this subspace while simultaneously performing dimension reduction in classification. This methodology, allows the cell probabilities to vary non-parametrically based on a few coordinates expressed as linear combinations of the predictors. Also, as opposed to many black-box methods for dimensionality reduction, the proposed model is appealing in having clearly interpretable and identifiable parameters which provide insight into which predictors are important in determining accurate classification boundaries. Gibbs sampling methods are developed for posterior computations. The estimated cell probabilities are theoretically shown to be consistent, and real data applications are included to support the findings.
Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to construct efficient non - parametric Bayesian classifiers in the presence of a large number of predictor variables. Specifically, the paper focuses on how to build classifiers under the following conditions: 1. **Handling a large number of predictor variables**: How to effectively classify when the data set contains a large number of predictor variables. 2. **Allowing different cell probabilities to vary non - parametrically**: The probabilities of different classes can vary non - parametrically based on a few coordinates (which are linear combinations of predictor variables). 3. **Interpretability of model parameters**: Model parameters should be clearly interpretable and be able to provide insights into which predictor variables play an important role in constructing accurate classification boundaries. 4. **Consistency of estimated cell probabilities**: The estimated cell probabilities are consistent in both the weak and strong senses. 5. **Data application to support results**: Verify the effectiveness of the model through the application of actual data sets. To achieve these goals, the paper proposes a non - parametric Bayesian classification method based on learning affine subspaces. This method can not only work effectively in high - dimensional data, but also provide insights into which predictor variables have a greater impact on the classification results.