An Efficient Model-Agnostic Approach for Uncertainty Estimation in Data-Restricted Pedometric Applications

Viacheslav Barkov,Jonas Schmidinger,Robin Gebbers,Martin Atzmueller
2024-09-18
Abstract:This paper introduces a model-agnostic approach designed to enhance uncertainty estimation in the predictive modeling of soil properties, a crucial factor for advancing pedometrics and the practice of digital soil mapping. For addressing the typical challenge of data scarcity in soil studies, we present an improved technique for uncertainty estimation. This method is based on the transformation of regression tasks into classification problems, which not only allows for the production of reliable uncertainty estimates but also enables the application of established machine learning algorithms with competitive performance that have not yet been utilized in pedometrics. Empirical results from datasets collected from two German agricultural fields showcase the practical application of the proposed methodology. Our results and findings suggest that the proposed approach has the potential to provide better uncertainty estimation than the models commonly used in pedometrics.
Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the issue of uncertainty estimation in soil property prediction, particularly in situations with limited data. Specifically: 1. **Data Scarcity**: Soil studies often face the problem of insufficient sample sizes because soil sampling is an expensive and time-consuming process. This results in smaller training datasets, limiting the performance of predictive models. 2. **Uncertainty Estimation**: In digital soil mapping, the uncertainty of model predictions is a critical factor, especially when users such as farmers rely on these predictions for decision-making. Reliable uncertainty measures are essential for building confidence in model outputs and supporting informed actions. To address these issues, the paper proposes a model-agnostic approach to estimate uncertainty, enabling models to directly output uncertainty estimates without the need for additional calibration datasets. This approach not only avoids further reducing the size of the training dataset, which is advantageous in data-scarce situations, but also introduces machine learning algorithms that have not yet been applied in the field of soil science. ### Method Overview The paper proposes a general adapter that converts regression tasks into classification problems, thereby utilizing classification algorithms for regression. The specific steps are as follows: 1. **Target Discretization**: Divide the continuous target variable into multiple intervals (referred to as "bins"). 2. **Classification Model Training**: Train a classification model to minimize categorical cross-entropy. 3. **Continuous Prediction Reconstruction**: Reconstruct continuous predictions from the output probabilities of the trained classifier. 4. **Model Uncertainty Estimation**: Calculate the standard deviation of the bin structures as a proxy for the model's intrinsic uncertainty. Additionally, the paper employs an ensemble method that combines model predictions under different bin sizes and strategies to enhance the robustness and reliability of the results. ### Experimental Results The experimental results demonstrate that the proposed Binned Uncertainty Estimation Ensemble method performs excellently on datasets from two agricultural sites in Germany, particularly in the uncertainty estimation of SOC (Soil Organic Carbon) predictions. The method shows the best results when combined with TabPFN and CatBoost models. It demonstrates its effectiveness through the lowest CRPS values and provides a visual representation of prediction reliability.