Model Selection with Informative Normalized Maximum Likelihood: Data Prior and Model Prior
Jun Zhang
DOI: https://doi.org/10.1142/9789814368018_0012
2011-09-01
Abstract:Model selection has, in the last decade, undergone rapid growth for evaluating models of cognitive processes, ever since its introduction to the mathematical/cognitive psychology community (Myung, Forster, & Browne, 2000). The term “model selection” refers to the task of selecting, among several competing alternatives, the “best” statistical model given experimental data. To avoid ambiguity, “best” here has a now-standard operational definition – the commonly accepted criterion is that models must not only show reasonable goodness-of-fit in accounting for existing data, but also demonstrate some kind of simplicity so that it would not capture sampling noise in the data. This criteria, emphasizing generalization as opposed to fitting as the goal of modeling, embodies Occam’s Razor, the principle of offering parsimonious explanation of data with fewest assumptions. Though mathematical implementations may differ, resulting in the various methods such as AIC, BIC, MDL, etc., each invariably boils down to balancing two aspects of model evaluation, one measuring its goodness-offit over existing data and the other measuring its complexity or capability for generalization. The Minimum Description Length (MDL) Principle (Rissanen, 1978, 1983, 1996, 2001) is an information theoretic approach to inductive inference with roots in algorithmic coding theory. It has become one of the most popular means for model selection (Grünwald, Myung, & Pitt, 2005; Grünwald, 2007). Under this approach, data are viewed as codes to be compressed by the model. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest descrip-
Mathematics