Knowledge distillation with insufficient training data for regression
Myeonginn Kang,Seokho Kang
DOI: https://doi.org/10.1016/j.engappai.2024.108001
IF: 8
2024-02-04
Engineering Applications of Artificial Intelligence
Abstract:Knowledge distillation has been widely used to compress a large teacher network into a smaller student network. Conventional approaches require the training dataset that was used to train the teacher network. However, in many real-world situations, the original training dataset is not fully-reusable owing to practical constraints, such as data security, privacy, and storage limits. In this study, we present a teacher–student matching method to improve knowledge distillation under data insufficiency for regression problems . Given an existing knowledge distillation method as the base, we introduce three additional learning objectives to make the student better emulate the prediction capability of the teacher: perturbation-based matching (PM), adversarial belief matching (ABM), and gradient matching (GM). PM is for matching the predictions of the teacher and student on synthetic data points created by perturbing original data points. ABM is for matching the predictions of the teacher and student on which the teacher and student make different predictions. GM is for matching the gradients of the teacher and student on the original and synthetic data points. We demonstrate that the proposed method improves the prediction performance of the student network, particularly when only a small part of the original training dataset is available for use. When 10% of the original training dataset is used for knowledge distillation, the root mean squared error of the student network is reduced by 43.91% on average compared with existing knowledge distillation methods.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary