Models and Selection Criteria for Regression and Classification

David Heckerman,Christopher Meek
DOI: https://doi.org/10.48550/arXiv.1302.1545
2013-02-06
Abstract:When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatoryor input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (y|x) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and another by Buntine (1993). We contrast these two criteria using the prequential framework of Dawid (1984), and give sufficient conditions under which the criteria agree.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to select appropriate Bayesian models in regression and classification tasks. Specifically, the author focuses on the conditional probability distribution \(p(Y|X)\), where \(Y\) is the outcome or class variable and \(X\) is the explanatory or input variable. The paper explores the following issues: 1. **Definition and Characteristics of Bayesian Regression/Classification (BRC) Models**: - The author introduces a special class of Bayesian models, called Bayesian Regression/Classification (BRC) models. These models can be decomposed into an independent conditional model \(p(Y|X)\) and an input model \(p(X)\). - The advantage of BRC models is that the conditional model can be analyzed independently, simplifying calculation and understanding. 2. **Conversion from Any Bayesian Model to BRC Models**: - The paper discusses the practice of converting any Bayesian model to a BRC model and points out that this practice may ignore prior knowledge and thus affect the learning effect. - For example, in the Naive Bayes model, the conditional likelihood \(p(Y|X, \theta_m, m)\) is a simple generalized linear model, while the input likelihood \(p(X|\theta_m, m)\) is a mixture distribution. 3. **Learning Methods for Bayesian Models**: - The paper compares two methods: Bayesian model averaging and model selection. - Model averaging predicts by synthesizing all possible model structures and their parameters, while model selection chooses one or several "good" model structures for prediction. 4. **Model Selection Criteria for Regression/Classification Tasks**: - The author discusses two Bayesian model selection criteria applicable to regression/classification tasks: one is the Conditional Node Monitoring (CNM) proposed by Spiegelhalter et al. (1993), and the other is the Class - Sequence Criterion (CSC) proposed by Buntine (1993). - These two criteria are consistent under certain conditions, especially in BRC models. 5. **Combination of Theory and Practice**: - Although the author raises theoretical doubts about non - trivial BRC models, these models may still have good prediction performance in practice. Especially when the nodes and their parent nodes are discrete, polynomial softmax regression may be useful. In summary, this paper aims to explore how to reasonably select and use Bayesian models, especially BRC models, in regression and classification tasks, and proposes some new insights and criteria to evaluate the selection and performance of these models.