Can Your Generative Model Detect Out-of-Distribution Covariate Shift?

Christiaan Viviers,Amaan Valiuddin,Francisco Caetano,Lemar Abdi,Lena Filatova,Peter de With,Fons van der Sommen
2024-10-09
Abstract:Detecting Out-of-Distribution (OOD) sensory data and covariate distribution shift aims to identify new test examples with different high-level image statistics to the captured, normal and In-Distribution (ID) set. Existing OOD detection literature largely focuses on semantic shift with little-to-no consensus over covariate shift. Generative models capture the ID data in an unsupervised manner, enabling them to effectively identify samples that deviate significantly from this learned distribution, irrespective of the downstream task. In this work, we elucidate the ability of generative models to detect and quantify domain-specific covariate shift through extensive analyses that involves a variety of models. To this end, we conjecture that it is sufficient to detect most occurring sensory faults (anomalies and deviations in global signals statistics) by solely modeling high-frequency signal-dependent and independent details. We propose a novel method, CovariateFlow, for OOD detection, specifically tailored to covariate heteroscedastic high-frequency image-components using conditional Normalizing Flows (cNFs). Our results on CIFAR10 vs. CIFAR10-C and ImageNet200 vs. ImageNet200-C demonstrate the effectiveness of the method by accurately detecting OOD covariate shift. This work contributes to enhancing the fidelity of imaging systems and aiding machine learning models in OOD detection in the presence of covariate shift.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to detect and quantify out - of - distribution (OOD) covariate shift in image data?** Specifically, the author focuses on how to identify these changes in high - dimensional image data when the high - level statistical properties of the image change, without relying on changes in semantic information. ### Problem Background 1. **Importance of OOD Detection** - When deploying accurate sensing technologies and reliable machine - learning systems, it is crucial to identify abnormal image statistical features. - Existing OOD detection methods mainly focus on semantic shift, while less research has been done on covariate shift. 2. **Definition of Covariate Shift** - Covariate shift refers to the situation where the high - level statistical properties of an image (such as illumination, noise, etc.) change, but the low - level semantic content (such as object categories) remains the same. - Such changes may lead to a decline in model prediction performance, especially in some professional imaging applications, which may indicate system failures. ### Main Contributions of the Paper 1. **Proposing the CovariateFlow Method** - CovariateFlow is a new method based on conditional normalizing flows (cNFs), specifically designed to detect and quantify covariate shift in high - frequency heteroscedastic image components. - By decomposing the image into low - frequency and high - frequency components and modeling the conditional distribution between them, CovariateFlow can more effectively capture covariate shift. 2. **Introducing the Normalized Score Distance (NSD)** - NSD combines the log - likelihood (LL) and the typicality score to overcome the limitations of a single method. - By standardizing the LL and the typicality score and calculating their absolute distances from the mean of the training set, NSD can more comprehensively evaluate whether a sample belongs to OOD. 3. **Experimental Verification** - The author conducted extensive experiments on the CIFAR10 vs. CIFAR10 - C and ImageNet200 vs. ImageNet200 - C datasets, demonstrating the effectiveness of CovariateFlow. - The experimental results show that CovariateFlow performs excellently in detecting covariate shift, especially when dealing with different types of image degradation. ### Formula Summary 1. **Log - Likelihood Formula of Normalizing Flows** \[ \log p(x)=\log p_Z(z_0)-\sum_{k = 1}^K\log\left|\det\frac{\partial f_k(z_{k - 1})}{\partial z_{k - 1}}\right| \] where \(z_0 = f^{-1}(x)\), and \(f\) is a combination of a series of bijective transformations. 2. **Definition of the Typical Set** \[ H(X)-\epsilon\leq-\frac{1}{N}\sum_{n = 1}^N\log_2 p(x_n)\leq H(X)+\epsilon \] where \(H(X)\) is the Shannon entropy of the dataset, and \(\epsilon\) is an arbitrarily small value. 3. **Normalized Score Distance (NSD)** \[ \text{NSD}(x^*)=\left|\frac{\log p(x^*) - \mu_L}{\sigma_L}\right|+\left|\frac{\|\nabla_x\log p(x^*)\| - \mu_T}{\sigma_T}\right| \] where \(\m\)