Safurai-Csharp: Harnessing Synthetic Data to improve language-specific Code LLM

Davide Cifarelli,Leonardo Boiardi,Alessandro Puppo,Leon Jovanovic
2023-11-07
Abstract:This paper introduces Safurai-Csharp, an open-source model designed to specialize in the generation, completion, and debugging of C# code. Safurai-Csharp is built upon the novel CodeLlama 34B model and leverages the EvolInstruct technique, creating a refined and expanded dataset for its fine-tuning process. The results of its performance, a notable score of 56.33% on the Manual MultiPL-E benchmark (Zero-Shot, Pass@1), signal its high capacity to streamline developers' workflows and aid code learning. It shows promise in setting new stakes in the landscape of open-source C# LLMs and hopes to inspire more inclusive and wide-ranging development in the field of language-specific LLMs.
Computation and Language
What problem does this paper attempt to address?