Towards Explainable Model Extraction Attacks
Anli Yan,Ruitao Hou,Xiaozhang Liu,Hongyang Yan,Teng Huang,Xianmin Wang
DOI: https://doi.org/10.1002/int.23022
IF: 8.993
2022-01-01
International Journal of Intelligent Systems
Abstract:One key factor able to boost the applications of artificial intelligence (AI) in security-sensitive domains is to leverage them responsibly, which is engaged in providing explanations for AI. To date, a plethora of explainable artificial intelligence (XAI) has been proposed to help users interpret model decisions. However, given its data-driven nature, the explanation itself is potentially susceptible to a high risk of exposing privacy. In this paper, we first show that the existing XAI is vulnerable to model extraction attacks and then present an XAI-aware dual-task model extraction attack (DTMEA). DTMEA can attack a target model with explanation services, that is, it can extract both the classification and explanation tasks of the target model. More specifically, the substitution model extracted by DTMEA is a multitask learning architecture, consisting of a sharing layer and two task-specific layers for classification and explanation. To reveal which explanation technologies are more vulnerable to expose privacy information, we conduct an empirical evaluation of four major explanation types in the benchmark data set. Experimental results show that the attack accuracy of DTMEA outperforms the predicted-only method with up to 1.25%, 1.53%, 9.25%, and 7.45% in MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100, respectively. By exposing the potential threats on explanation technologies, our research offers the insights to develop effective tools that are able to trade off security-sensitive relationships.