Joint Maximum Margin and Maximum Entropy Learning of Graphical Models

Jun Zhu
2010-01-01
Abstract:INFERRING structured predictions based on correlated covariates remains a central problem in many fields, including NLP, computer vision, and computational biology. Typically, both the input covariates and output prediction s can be high-dimensional, multi-modal, noisy, partially obser vable, and bearing latent structures, each of these characteristi cs adds a degree of complexity to the task of learning structured input/output (I/O) models. Several recent approache s to structured I/O learning are based on learning discriminati ve probabilistic graphical models. By defining composite feat ures that explicitly exploit the structured dependencies among input elements (e.g., words in a sentence) and among the interpretational outputs (e.g., part-of-speech tags), su ch models can produce semantically consistent predictions from comp lex inputs. However, how to train such models properly remains a highly contested issue. The two dominant paradigms for training such models are the maximum (conditional) likelih ood estimation (MLE) [6], which leads to the well-known CRF, and the max-margin learning [9], [10], which leads to the M N. While both methods have enjoyed remarkable success and are widely used, they have a number of deficiencies, as we will discuss below. In this project, we introduce a new paradigm for learning structured I/O models, and graphical models in general, that conjoins and extends the merits of MLE and maxmargin learning while avoiding their shortcomings.
What problem does this paper attempt to address?