Heterogeneous Defect Prediction Based on Federated Transfer Learning via Knowledge Distillation

Aili Wang,Yutong Zhang,Yixin Yan
DOI: https://doi.org/10.1109/access.2021.3058886
IF: 3.9
2021-01-01
IEEE Access
Abstract:Heterogeneous defect prediction (HDP) aims to predict defect-prone software modules in one project using heterogeneous data collected from other projects. There are two characteristics of defect data: data islands, and data privacy. In this article, we propose a novel Federated Transfer Learning via Knowledge Distillation (FTLKD) approach for HDP, which takes into consideration two characteristics of defect data. Firstly, Shamir sharing technology achieves homomorphic encryption for private data. During subsequent processing and operations, data remains encrypted all the time. Secondly, each participant uses public data to train convolutional neural networks(CNN), the parameters of the pre-trained CNN are transferred to a private model. A small amount of labeled private data fine-tunes the private model. Finally, knowledge distillation realizes the communication between the participants. The average of all softmax output (logits) is used for knowledge distillation to update the private models. Extensive experiments on 9 projects in 3 public databases (NASA, AEEEM and SOFTLAB) show that FTLKD outperforms the related competing methods.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?