Abstract:Large code models (LCMs) have remarkably advanced the field of code intelligence. Despite their impressive capabilities, they still face practical employment challenges, such as high costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These challenges highlight the critical need for more accessible, lightweight yet effective LCMs. In this paper, we propose IterKD, an Iter Knowledge Distillation framework, which aims at continually transferring the programming capabilities of larger, advanced LCMs (Teacher) to smaller, less powerful LCMs (Student). IterKD consists of three stages in one cycle: (1) Correct-and-Fault Knowledge Delivery stage aims at improving the student models capability to recognize errors while ensuring its basic programming skill during the knowledge transferring, which involves correctness-aware supervised learning and fault-aware contrastive learning methods. (2) Multi-view Feedback stage aims at measuring the quality of results generated by the student model from two views, including model-based and static tool-based measurement; (3) Feedback-based Knowledge Update stage aims at updating the student model adaptively by generating new questions at different difficulty levels, in which the difficulty levels are categorized based on the feedback in the last stage. By performing the training cycle iteratively, the student model is continuously refined through learning more advanced programming skills from the teacher model. Finally, based on the proposed IterKD framework, we develop a lightweight yet effective LCM, named IterCoder, which is built upon CodeLlama-7B. Experimental results show that IterCoder achieves a Pass@1 score of 65.2 on the HumanEval benchmark, outperforming over-30B-sized LCMs by an average of 47.51% and surpassing comparable-sized LCMs by an average of 118.47%.

Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning

Less confidence, less forgetting: Learning with a humbler teacher in exemplar-free Class-Incremental learning

Comparative Knowledge Distillation

Class Incremental Learning with Multi-Teacher Distillation

CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

Revisiting Knowledge Distillation Via Label Smoothing Regularization

Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation

Learn From the Past: Experience Ensemble Knowledge Distillation

Collaborative Knowledge Distillation

TC<SUP>3</SUP>KD: Knowledge distillation via teacher-student cooperative curriculum customization

Rethinking Knowledge Distillation Via Cross-Entropy

Adaptive Teaching with Shared Classifier for Knowledge Distillation

DiffClass: Diffusion-Based Class Incremental Learning

Improving Knowledge Distillation With a Customized Teacher

Improving Knowledge Distillation with Teacher's Explanation

Improved Knowledge Distillation via Adversarial Collaboration

Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class Incremental Learning

Iterative Knowledge Distillation through Feedback-Driven Learning Cycles

An Embarrassingly Simple Approach for Knowledge Distillation

Class Incremental Learning Via Dynamic Regeneration with Task-Adaptive Distillation