Abstract:Sharing a pre-trained machine learning model, particularly a deep neural network via prediction APIs, is becoming a common practice on machine learning as a service (MLaaS) platforms nowadays. Although deep neural networks (DNN) have shown remarkable successes in many tasks, they are also criticized for the lack of interpretability and transparency. Interpreting a shared DNN model faces two additional challenges compared with interpreting a general model. (1) Limited training data can be disclosed to users. (2) The internal structure of the models may not be available. These two challenges impede the application of most existing interpretability approaches, such as saliency maps or influence functions, for DNN models. Case-based reasoning methods have been used for interpreting decisions; however, how to select and organize the data points under the constraints of shared DNN models is not discussed. Moreover, simply providing cases as explanations may not be sufficient for supporting instance level interpretability. Meanwhile, existing interpretation methods for DNN models generally lack the means to evaluate the reliability of the interpretation. In this article, we propose a framework named Shared Model INTerpreter (SMINT) to address the above limitations. We propose a new data structure called a boundary graph to organize training points to mimic the predictions of DNN models. We integrate local features, such as saliency maps and interpretable input masks, into the data structure to help users to infer the model decision boundaries. We show that the boundary graph is able to address the reliability issues in many local interpretation methods. We further design an algorithm named hidden-layer aware p-test to measure the reliability of the interpretations. Our experiments show that SMINT is able to achieve above 99% fidelity to corresponding DNN models on both MNIST and ImageNet by sharing only a tiny fraction of training data to make these models interpretable. The human pilot study demonstrates that SMINT provides better interpretability compared with existing methods. Moreover, we demonstrate that SMINT is able to assist model tuning for better performance on different user data.

Another Use of SMOTE for Interpretable Data Collaboration Analysis

A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RF

SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique

Accuracy and Privacy Evaluations of Collaborative Data Analysis

SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features

PF-SMOTE: A Novel Parameter-Free SMOTE for Imbalanced Datasets

DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

New Solutions Based on the Generalized Eigenvalue Problem for the Data Collaboration Analysis

SMOTE: Synthetic Minority Over-sampling Technique

Collaborative causal inference on distributed data

Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing

HarmoSATE: Harmonized embedding-based self-attentive encoder to improve accuracy of privacy-preserving federated predictive analysis

Optimizing Privacy and Utility Tradeoffs for Group Interests Through Harmonization

SMINT

Over-sampling algorithm for imbalanced data classification

FLAIM: AIM-based Synthetic Data Generation in the Federated Setting

Enhanced analysis of tabular data through Multi-representation DeepInsight

Less is More: Fewer Interpretable Region via Submodular Subset Selection

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

Data Collaboration Analysis applied to Compound Datasets and the Introduction of Projection data to Non-IID settings