Abstract:In this paper, a sparse learning algorithm, probabilistic classification vector machines (PCVMs), is proposed. We analyze relevance vector machines (RVMs) for classification problems and observe that adopting the same prior for different classes may lead to unstable solutions. In order to tackle this problem, a signed and truncated Gaussian prior is adopted over every weight in PCVMs, where the sign of prior is determined by the class label, i.e., +1 or -1. The truncated Gaussian prior not only restricts the sign of weights but also leads to a sparse estimation of weight vectors, and thus controls the complexity of the model. In PCVMs, the kernel parameters can be optimized simultaneously within the training algorithm. The performance of PCVMs is extensively evaluated on four synthetic data sets and 13 benchmark data sets using three performance metrics, error rate (ERR), area under the curve of receiver operating characteristic (AUC), and root mean squared error (RMSE). We compare PCVMs with soft-margin support vector machines (SVM(Soft)), hard-margin support vector machines (SVM(Hard)), SVM with the kernel parameters optimized by PCVMs (SVM(PCVM)), relevance vector machines (RVMs), and some other baseline classifiers. Through five replications of twofold cross-validation F test, i.e., 5 x 2 cross-validation F test, over single data sets and Friedman test with the corresponding post-hoc test to compare these algorithms over multiple data sets, we notice that PCVMs outperform other algorithms, including SVM(Soft), SVM(Hard), RVM, and SVM(PCVM), on most of the data sets under the three metrics, especially under AUC. Our results also reveal that the performance of SVM(PCVM) is slightly better than SVM(Soft), implying that the parameter optimization algorithm in PCVMs is better than cross validation in terms of performance and computational complexity. In this paper, we also discuss the superiority of PCVMs' formulation using maximum a posteriori (MAP) analysis and margin analysis, which explain the empirical success of PCVMs.

Sparse Bayesian Approach for Feature Selection

Probabilistic Feature Selection and Classification Vector Machine

Uncertainty-Based Active Learning Via Sparse Modeling for Image Classification

Probabilistic Classification Vector Machines.

Efficient Probabilistic Classification Vector Machine with Incremental Basis Function Selection.

Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection

Fully Bayesian Classification with Heavy-tailed Priors for Selection in High-dimensional Features with Grouping Structure

Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm

Convex Sparse PCA for Unsupervised Feature Learning.

Probabilistic Classifiers with a Generalized Gaussian Scale Mixture Prior

A Convex Sparse PCA for Feature Analysis.

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

High-dimensional Feature Selection Using Hierarchical Bayesian Logistic Regression with Heavy-tailed Priors

Optimal Feature Selection for Sparse Linear Discriminant Analysis and Its Applications in Gene Expression Data

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

Band selection based gaussian processes for hyperspectral remote sensing images classification

Multi-classification Algorithm Based on Truncated Gaussian Prior and Variational Bayesian

Spike and slab Bayesian sparse principal component analysis

Multi-class feature selection via Sparse Softmax with a discriminative regularization

Probabilistic Classification Vector Machine for Multi-Class Classification

Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring