Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction for Uncertainty Quantification

Xing Yan,Yonghua Su,Wenxuan Ma
DOI: https://doi.org/10.48550/arXiv.2211.14545
2023-05-29
Abstract:We propose a novel, succinct, and effective approach for distribution prediction to quantify uncertainty in machine learning. It incorporates adaptively flexible distribution prediction of $\mathbb{P}(\mathbf{y}|\mathbf{X}=x)$ in regression tasks. This conditional distribution's quantiles of probability levels spreading the interval $(0,1)$ are boosted by additive models which are designed by us with intuitions and interpretability. We seek an adaptive balance between the structural integrity and the flexibility for $\mathbb{P}(\mathbf{y}|\mathbf{X}=x)$, while Gaussian assumption results in a lack of flexibility for real data and highly flexible approaches (e.g., estimating the quantiles separately without a distribution structure) inevitably have drawbacks and may not lead to good generalization. This ensemble multi-quantiles approach called EMQ proposed by us is totally data-driven, and can gradually depart from Gaussian and discover the optimal conditional distribution in the boosting. On extensive regression tasks from UCI datasets, we show that EMQ achieves state-of-the-art performance comparing to many recent uncertainty quantification methods. Visualization results further illustrate the necessity and the merits of such an ensemble model.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of uncertainty quantification in regression tasks in machine learning. Specifically, it aims to improve the prediction of the conditional distribution \(P(y|X = x)\), thereby quantifying the uncertainty in model predictions more accurately. Traditional Gaussian - assumption methods lack flexibility when dealing with real - data, while highly flexible methods (such as estimating quantiles independently without relying on the distribution structure) may lead to insufficient generalization ability. Therefore, this paper proposes a new Ensemble Multi - Quantiles (EMQ) method to predict the conditional distribution in an adaptive and flexible manner and demonstrates its superior performance on multiple datasets. ### Core problems of the paper 1. **Importance of uncertainty quantification**: - Although deep - learning models have achieved state - of - the - art performance in many tasks, their estimates of uncertainty are often over - confident, which may lead to high - risk decisions in practical applications. - Accurate uncertainty estimation can help the model transfer the decision - making power to human experts when the uncertainty is high, or transfer the control to human operators in scenarios such as autonomous driving. 2. **Limitations of existing methods**: - Methods that assume a Gaussian distribution lack flexibility and cannot capture multimodality, asymmetry, and heavy - tailedness in real - data. - Highly flexible methods (such as non - parametric methods) may lead to over - fitting and produce density functions that are difficult to interpret. 3. **Solution proposed in the paper**: - A new Ensemble Multi - Quantiles (EMQ) method is proposed to predict the conditional distribution \(P(y|X = x)\) by adaptively balancing the distribution structure and flexibility. - This method starts from a Gaussian distribution and gradually adjusts the quantile prediction to better adapt to the distribution characteristics of real - data. - By introducing an adaptive T - strategy, the number of integration steps is determined dynamically, thereby finding the optimal balance between the distribution structure and flexibility. ### Main contributions 1. **Novel ensemble - learning method**: - A concise and effective method is proposed to predict the conditional distribution by adaptively balancing the distribution structure (such as Gaussian distribution) and flexibility. 2. **Naturally overcome the quantile - crossing problem**: - Without additional effort (such as constrained optimization or post - processing), the EMQ method naturally solves the quantile - crossing problem in multi - quantile estimation. 3. **Superior experimental performance**: - Experimental results on multiple datasets show that the EMQ method outperforms many existing uncertainty quantification methods in terms of calibration and sharpness, including methods based on Gaussian assumptions, Bayesian methods, quantile regression, and traditional tree models. 4. **Adaptive flexibility**: - Experiments verify that the EMQ method can adaptively perform flexible distribution prediction, especially when using the adaptive T - strategy. 5. **Wide applicability**: - This method successfully captures different types of data - distribution characteristics, including peakedness, asymmetry, long - tail, and multimodality, verifying its necessity and advantages in complex real - world data. In conclusion, this paper solves the balance problem between flexibility and distribution structure in existing uncertainty quantification methods by proposing the EMQ method, providing more accurate and reliable uncertainty estimates for regression tasks in machine learning.