Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez,Been Kim
2017-03-03
Abstract:As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of interpretability in machine - learning systems in practical applications, which leads to difficulties in evaluating the safety, fairness of these systems and avoiding technical debt, etc. Since these problems are often not fully quantifiable, the need for interpretability becomes particularly important. However, at present, there is a lack of consensus on the definition and evaluation methods of interpretability, making it difficult to compare different methods or understand when these methods can be generalized for use. Specifically, the paper points out the importance of interpretability in the following situations: - **Scientific understanding**: Helping humans acquire knowledge. - **Safety**: Ensuring that systems in complex tasks do not fail under unforeseen circumstances. - **Ethics**: Preventing certain types of discrimination and ensuring the fairness of the system. - **Goal mismatch**: When the goals optimized by the algorithm are not completely consistent with the final goals, interpretability can help identify this mismatch. - **Multi - goal trade - off**: When multiple optimization goals compete with each other, interpretability can help understand the specific dynamics of these trade - offs. To address these problems, the paper proposes a taxonomy for interpretability evaluation, including: 1. **Application - grounded Evaluation**: Conduct human experiments in real - application scenarios to evaluate the performance of the model in specific tasks. 2. **Human - grounded Evaluation**: Evaluate the quality of explanations through human experiments on simplified tasks. 3. **Functionally - grounded Evaluation**: Do not rely on human experiments, but use some formal definition of interpretability as a proxy indicator to evaluate the quality of explanations. Finally, the paper discusses some open questions, including how to determine proxy indicators suitable for different real - world applications, important factors in designing simplified tasks that maintain the essence of real tasks, and important factors in characterizing proxy indicators of explanation quality. These open questions require further research to establish a formal link for interpretability evaluation.