Abstract:Driven by the goal to enable sleep apnea monitoring and machine learning-based detection at home with small mobile devices, we investigate whether interpretation-based indirect knowledge transfer can be used to create classifiers with acceptable performance. Interpretation-based indirect knowledge transfer means that a classifier (student) learns from a synthetic dataset based on the knowledge representation from an already trained Deep Network (teacher). We use activation maximization to generate visualizations and create a synthetic dataset to train the student classifier. This approach has the advantage that student classifiers can be trained without access to the original training data. With experiments we investigate the feasibility of interpretation-based indirect knowledge transfer and its limitations. The student achieves an accuracy of 97.8% on MNIST (teacher accuracy: 99.3%) with a similar smaller architecture to that of the teacher. The student classifier achieves an accuracy of 86.1% and 89.5% for a subset of the Apnea-ECG dataset (teacher: 89.5% and 91.1%, respectively).

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to create a classifier with acceptable performance through interpretation - based indirect knowledge transfer without the original training data. Specifically, the author hopes to use the knowledge representation of the trained deep neural network (DNN, teacher model) to generate a synthetic dataset and use these synthetic data to train a smaller classifier (student model), so as to be able to perform sleep apnea (OSA) monitoring and detection on small mobile devices with limited resources. ### Main problems and goals 1. **OSA monitoring on resource - limited devices** - The goal is to achieve OSA monitoring and detection on small home - use mobile devices (such as smart phones or smart watches). - These devices have limited computing resources, so a method needs to be developed to create small but high - performance classifiers. 2. **Knowledge transfer challenges** - The main challenge faced by the author is how to transfer the knowledge of the large - scale deep neural network (teacher model) to the small - scale classifier (student model) without accessing the original training data. - This is because there are privacy and sharing issues in health data (such as sleep monitoring data). 3. **Interpretation - based indirect knowledge transfer** - The author proposes a new knowledge transfer method, called interpretation - based indirect knowledge transfer. This method generates visual images through activation maximization to form a synthetic dataset for training the student model. - Specific steps include: - Training the teacher model. - Using activation maximization to generate a synthetic dataset. - Using the synthetic dataset to train the student model. 4. **Evaluation and verification** - The author verifies the feasibility and limitations of this method through experiments. - The experimental results show that the student model achieves an accuracy of 97.8% on the MNIST dataset (the teacher model is 99.3%), and on a subset of the Apnea - ECG dataset, the accuracies are 86.1% and 89.5% respectively (the teacher model is 89.5% and 91.1%). ### Core contributions 1. **Demonstrate that the classifier can learn from feature visualization** - It is proved that the student model can learn from real data that has not been seen before, relying only on the synthetic data generated by the teacher model. 2. **Develop a new knowledge transfer technique** - A method for training the student network that does not require the original training data and is independent of the architecture or algorithm is proposed. 3. **Introduce new metrics** - Two new measurement standards are proposed to measure the difficulty for other trained classifiers to recognize synthetic data and whether "hidden" information can be transferred between classifiers without being detected by most classifiers. ### Summary The main goal of this paper is to solve the problem of OSA monitoring on resource - limited devices. Through the method of interpretation - based indirect knowledge transfer, a small - scale classifier is successfully trained without using the original training data, and its performance on multiple datasets is demonstrated.

Learning from Higher-Layer Feature Visualizations

Research on Knowledge Distillation Algorithm of Object Detection

Evaluating Knowledge Transfer in Neural Network for Medical Images

VisuaLizations As Intermediate Representations (VLAIR): an Approach for Applying Deep Learning-Based Computer Vision to Non-Image-based Data

Analysis of Knowledge Transfer in Kernel Regime

Teacher Assistant-Based Knowledge Distillation Extracting Multi-level Features on Single Channel Sleep EEG

Visualizing the embedding space to explain the effect of knowledge distillation

Aligning Machine and Human Visual Representations across Abstraction Levels

A Multi-Level Interpretable Sleep Stage Scoring System by Infusing Experts' Knowledge Into a Deep Network Architecture

Feature-Based Knowledge Distillation for Infrared Small Target Detection

Interpretable Embedding Procedure Knowledge Transfer via Stacked Principal Component Analysis and Graph Neural Network

Distilling Image Classifiers in Object Detectors

Explaining Predictions of Deep Neural Classifier via Activation Analysis

Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition

Deep Convolutional Neural Networks for Interpretable Analysis of EEG Sleep Stage Scoring

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

Learning Visual Models using a Knowledge Graph as a Trainer

Revealing Networks: Understanding Effective Teacher Practices in AI-Supported Classrooms using Transmodal Ordered Network Analysis

Adaptive Teaching with Shared Classifier for Knowledge Distillation

Infusing Expert Knowledge Into a Deep Neural Network Using Attention Mechanism for Personalized Learning Environments

MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models