Abstract:When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatoryor input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (y|x) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and another by Buntine (1993). We contrast these two criteria using the prequential framework of Dawid (1984), and give sufficient conditions under which the criteria agree.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to select appropriate Bayesian models in regression and classification tasks. Specifically, the author focuses on the conditional probability distribution \(p(Y|X)\), where \(Y\) is the outcome or class variable and \(X\) is the explanatory or input variable. The paper explores the following issues: 1. **Definition and Characteristics of Bayesian Regression/Classification (BRC) Models**: - The author introduces a special class of Bayesian models, called Bayesian Regression/Classification (BRC) models. These models can be decomposed into an independent conditional model \(p(Y|X)\) and an input model \(p(X)\). - The advantage of BRC models is that the conditional model can be analyzed independently, simplifying calculation and understanding. 2. **Conversion from Any Bayesian Model to BRC Models**: - The paper discusses the practice of converting any Bayesian model to a BRC model and points out that this practice may ignore prior knowledge and thus affect the learning effect. - For example, in the Naive Bayes model, the conditional likelihood \(p(Y|X, \theta_m, m)\) is a simple generalized linear model, while the input likelihood \(p(X|\theta_m, m)\) is a mixture distribution. 3. **Learning Methods for Bayesian Models**: - The paper compares two methods: Bayesian model averaging and model selection. - Model averaging predicts by synthesizing all possible model structures and their parameters, while model selection chooses one or several "good" model structures for prediction. 4. **Model Selection Criteria for Regression/Classification Tasks**: - The author discusses two Bayesian model selection criteria applicable to regression/classification tasks: one is the Conditional Node Monitoring (CNM) proposed by Spiegelhalter et al. (1993), and the other is the Class - Sequence Criterion (CSC) proposed by Buntine (1993). - These two criteria are consistent under certain conditions, especially in BRC models. 5. **Combination of Theory and Practice**: - Although the author raises theoretical doubts about non - trivial BRC models, these models may still have good prediction performance in practice. Especially when the nodes and their parent nodes are discrete, polynomial softmax regression may be useful. In summary, this paper aims to explore how to reasonably select and use Bayesian models, especially BRC models, in regression and classification tasks, and proposes some new insights and criteria to evaluate the selection and performance of these models.

Models and Selection Criteria for Regression and Classification

Bayesian Variable Selection with Related Predictors

Bayesian Restricted Likelihood Methods: Conditioning on Insufficient Statistics in Bayesian Regression

Bayesian Model Selection for a Class of Spatially-Explicit Capture Recapture Models

Selection of Regression Models under Linear Restrictions for Fixed and Random Designs

Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).

Bayesian Model Selection Methods and Their Application to Biological ODE Systems

Bayesian variable selection in linear regression models with instrumental variables

Model Selection Criteria for Latent Growth Models Using Bayesian Methods

Bayesian Analysis of Binary and Polychotomous Response Data

Choosing models in model-based clustering and discriminant analysis

Bayesian Model Selection in High-Dimensional Settings

Reconciling the Bayes Factor and Likelihood Ratio for Two Non-Nested Model Selection Problems

Optimal Bayesian design for model discrimination via classification

Hierarchical approaches for flexible and interpretable binary regression models

Comparison of Bayesian predictive methods for model selection

On Supervised Selection of Bayesian Networks

Detection of latent heteroscedasticity and group-based regression effects in linear models via Bayesian model selection

A Fully Nonparametric Modelling Approach to Binary Regression

Embedded Bayesian Network Classifiers

The Best Fit Bayesian Hierarchical Generalized Linear Model Selection Using Information Complexity Criteria in the MCMC Approach