Feature Attribution Explanation to Detect Harmful Dataset Shift.

Ziming Wang,Changwu Huang,Xin Yao
DOI: https://doi.org/10.1109/ijcnn54540.2023.10191221
2023-01-01
Abstract:Detecting whether a distribution shift has occurred in the dataset is a critical aspect when implementing machine learning models, as even a small shift in the data distribution may largely affect the performance of a machine learning model and thus cause the deployed model to fail. In this work, we focus on detecting harmful dataset shifts, i.e., shifts that are detrimental to the performance of the machine learning model. The existing methods usually detect whether there is a shift between two datasets according to the following framework: first carrying out dimensionality reduction on the datasets, then determining whether dataset shift exists according to the two-sample statistical test(s) on the reduced datasets. The knowledge contained in the model trained on the dataset is not utilized in the above described dataset shift detection framework. To address this, this paper proposes to take advantage of explainable artificial intelligence (XAI) techniques to exploit the knowledge in trained models when detecting harmful dataset shifts. Specifically, we employ the feature attribution explanation (FAE) method to capture the knowledge in the model and combine it with a widely-used two-sample test method, i.e., maximum mean difference (MMD), to detect harmful dataset shifts. The experimental results on more than twenty different shifts in three widely used image datasets demonstrate that the proposed method is more effective in identifying harmful dataset shifts than existing methods. Moreover, experiments on several different models show that the method is robust and effective over different models, i.e., its detection performance is not sensitive to the model used.
What problem does this paper attempt to address?