Abstract:This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

What problem does this paper attempt to address?

The paper primarily addresses the problem of understanding and generating long-context code, specifically in open-source code language models. Specifically, the paper introduces IBM's long-context Granite code model, including 3B and 8B versions, which support effective context lengths of up to 128K tokens. The researchers extended the model's ability to handle long sequences through continual pretraining and instruction tuning, without significantly affecting its performance on standard code completion benchmarks. The key contributions of the paper are as follows: 1. **Continual Pretraining**: The researchers adopted a lightweight continual pretraining method that gradually increases the RoPE base frequency and uses repository-level file packing and length upsampling of long-context data to extend the context length of the Granite 3B/8B code model from 2K/4K to 128K. 2. **Instruction Tuning**: To further enhance the model's support for long-context, the researchers performed instruction tuning on a dataset that includes a mix of short-context and long-context instruction-response pairs. This involved generating multi-turn instruction data using the original Granite-8B-Code-Instruct model to avoid reliance on existing long-context models. 3. **Performance Evaluation**: The paper evaluates the performance of the long-context Granite code model on various benchmark tests, including HumanEvalPack, Long Code Completion, RepoBench-P, RepoQA, and Key Retrieval. Experimental results demonstrate significant improvements of the long-context model on long-context tasks, while showing no significant degradation on short-context tasks. 4. **Open Source Release**: All long-context Granite code models have been open-sourced under the Apache 2.0 license for research and commercial purposes. In summary, this paper demonstrates effective ways to extend the context length of code language models for better understanding and generation of long-code sequences, which is of significant importance in real-world software development.

Scaling Granite Code Models to 128K Context

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Data Engineering for Scaling Language Models to 128K Context

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

RULER: What's the Real Context Size of Your Long-Context Language Models?

CodeGRU: Context-aware deep learning with gated recurrent unit for source code modeling

How to Train Long-Context Language Models (Effectively)

Context Parallelism for Scalable Million-Token Inference

Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis

RepoFusion: Training Code Models to Understand Your Repository

Parallel Context Windows for Large Language Models

Training-Free Long-Context Scaling of Large Language Models

Long-Context Language Modeling with Parallel Context Encoding

Training-Free Exponential Context Extension via Cascading KV Cache

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving

Granite Guardian

Long Context RAG Performance of Large Language Models