Using Stratified Sampling to Improve LIME Image Explanations

Muhammad Rashid,Elvio G. Amparore,Enrico Ferrari,Damiano Verda
2024-03-26
Abstract:We investigate the use of a stratified sampling approach for LIME Image, a popular model-agnostic explainable AI method for computer vision tasks, in order to reduce the artifacts generated by typical Monte Carlo sampling. Such artifacts are due to the undersampling of the dependent variable in the synthetic neighborhood around the image being explained, which may result in inadequate explanations due to the impossibility of fitting a linear regressor on the sampled data. We then highlight a connection with the Shapley theory, where similar arguments about undersampling and sample relevance were suggested in the past. We derive all the formulas and adjustment factors required for an unbiased stratified sampling estimator. Experiments show the efficacy of the proposed approach.
Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily explores the sampling strategy issue in the LIME (Local Interpretable Model-agnostic Explanations) image explanation method. Specifically: 1. **Limitations of LIME Image Explanation**: - The paper points out that the traditional LIME image explanation method uses a Monte Carlo sampling strategy when generating synthetic neighborhoods, which may lead to uneven sample distribution in the synthetic neighborhood, especially when the number of superpixels is large. This uneven sampling can result in poor fitting of the linear regressor, leading to inaccurate explanations. 2. **Shortcomings of Monte Carlo Sampling**: - As the number of superpixels increases, Monte Carlo sampling tends to concentrate around the middle values, ignoring extreme cases. This means that local behavior (i.e., samples close to the input sample) is severely underestimated in the synthetic neighborhood, resulting in poor explanation performance. 3. **Issues with Dependent Variable Distribution**: - When the number of superpixels increases, the distribution of the dependent variable (i.e., classification scores) becomes flat, making it difficult for the linear regressor to fit the data well, resulting in confusing explanations. ### Solution: 1. **Stratified Sampling Strategy**: - The paper proposes a new method based on stratified sampling to overcome the above issues. By ensuring that samples within each stratum have an equal probability of being selected, the entire sample space can be better covered, thereby improving the quality of explanations. 2. **Adjustment Factor**: - To correct the bias introduced by stratified sampling, the paper introduces an adjustment factor to balance the oversampled data points. This allows the linear regressor to fit the samples more accurately. 3. **Experimental Validation**: - The paper validates the effectiveness of the stratified sampling method through a series of experiments and demonstrates its superior performance over traditional Monte Carlo sampling methods under different numbers of superpixels. In summary, this paper aims to improve the accuracy and stability of explanations by enhancing the sampling strategy in the LIME image explanation method.