Distributionally Robust Graphical Models

Rizal Fathony,Ashkan Rezaei,Mohammad Ali Bashiri,Xinhua Zhang,Brian D. Ziebart
DOI: https://doi.org/10.48550/arXiv.1811.02728
2018-11-07
Abstract:In many structured prediction problems, complex relationships between variables are compactly defined using graphical structures. The most prevalent graphical prediction methods---probabilistic graphical models and large margin methods---have their own distinct strengths but also possess significant drawbacks. Conditional random fields (CRFs) are Fisher consistent, but they do not permit integration of customized loss metrics into their learning process. Large-margin models, such as structured support vector machines (SSVMs), have the flexibility to incorporate customized loss metrics, but lack Fisher consistency guarantees. We present adversarial graphical models (AGM), a distributionally robust approach for constructing a predictor that performs robustly for a class of data distributions defined using a graphical structure. Our approach enjoys both the flexibility of incorporating customized loss metrics into its design as well as the statistical guarantee of Fisher consistency. We present exact learning and prediction algorithms for AGM with time complexity similar to existing graphical models and show the practical benefits of our approach with experiments.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the trade - off between the two existing main methods in structured prediction tasks - probabilistic graphical models (such as Conditional Random Fields (CRFs)) and large - margin methods (such as Structural Support Vector Machines (SSVMs)). Specifically: - **Conditional Random Fields (CRFs)**: These models have Fisher consistency, that is, under ideal learning conditions (using the true data distribution and a fully expressive feature representation), they can produce predictions that minimize the expected loss. However, CRFs cannot integrate custom - made evaluation loss metrics during the training process. - **Structural Support Vector Machines (SSVMs)**: This type of model can directly integrate custom - made evaluation loss metrics in the training optimization process, but lacks the Fisher consistency guarantee in the multi - class setting. To overcome the limitations of these two methods, the paper proposes **Adversarial Graphical Models (AGM)**, which is a distribution - robust method aiming to construct a predictor that is robust to a class of data distributions defined by graph structures. The AGM method not only has the flexibility to integrate custom - made loss metrics but also provides a statistical Fisher consistency guarantee. ### Main contributions of the paper 1. **Proposing Adversarial Graphical Models (AGM)**: - AGM looks for a predictor through a robust adversarial formula, which minimizes a loss metric in the worst - case scenario given the statistical summary of the empirical distribution. - This method allows the replacement of the empirical training data with an adversary, which can freely choose the evaluation distribution in the set of distributions that match the statistical summary of the empirical training data. 2. **Theoretical guarantees**: - The AGM framework accepts multiple loss metrics and provides a statistical Fisher consistency guarantee for the selected loss metrics. - Through the robust adversarial formula, AGM more closely aligns the training objective with the evaluation loss metric while maintaining convexity. 3. **Efficient algorithms**: - The paper proposes exact learning and prediction algorithms for low - tree - width graph structures, with a time complexity similar to existing graph models. - Experimental results show that AGM outperforms previous models in structured prediction tasks. ### Mathematical formulas - **Adversarial prediction method**: \[ \min_{\hat{P}(\hat{y}|x)} \max_{\check{P}(\check{y}|x)} \mathbb{E}_{X \sim \tilde{P}; \hat{Y}|X \sim \hat{P}; \check{Y}|X \sim \check{P}}[\text{loss}(\hat{Y}, \check{Y})] \] where: \[ \mathbb{E}_{X \sim \tilde{P}; \check{Y}|X \sim \check{P}}[\Phi(X, \check{Y})] = \tilde{\Phi} \] - **Bi - optimization problem**: \[ \min_{\theta_e, \theta_v} \mathbb{E}_{X, Y \sim \tilde{P}} \max_{\check{P}(\check{y}|x)} \min_{\hat{P}(\hat{y}|x)} \left[ \sum_i \sum_{\hat{y}_i, \check{y}_i} \hat{P}(\hat{y}_i|x) \check{P}(\check{y}_i|x) \text{loss}(\hat{y}_i, \check{y}_i) + \cdots \right] \] ### Experimental verification The paper carried out experimental verification on two different tasks: 1. **Facial emotion intensity prediction**: - The task is to predict the emotion intensity of each image given a series of facial images. - The emotion intensity labels are divided into three ordered categories: neutral < increasing < peak.