Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Bo Peng,Daniel Goldstein,Quentin Anthony,Alon Albalak,Eric Alcaide,Stella Biderman,Eugene Cheah,Xingjian Du,Teddy Ferdinan,Haowen Hou,Przemysław Kazienko,Kranthi Kiran GV,Jan Kocoń,Bartłomiej Koptyra,Satyapriya Krishna,Ronald McClelland Jr.,Jiaju Lin,Niklas Muennighoff,Fares Obeid,Atsushi Saito,Guangyu Song,Haoqin Tu,Cahya Wirawan,Stanisław Woźniak,Ruichong Zhang,Bingchen Zhao,Qihang Zhao,Peng Zhou,Jian Zhu,Rui-Jie Zhu
2024-09-27
Abstract:We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: <a class="link-external link-https" href="https://huggingface.co/RWKV" rel="external noopener nofollow">this https URL</a> Training code at: <a class="link-external link-https" href="https://github.com/RWKV/RWKV-LM" rel="external noopener nofollow">this https URL</a> Inference code at: <a class="link-external link-https" href="https://github.com/RWKV/ChatRWKV" rel="external noopener nofollow">this https URL</a> Time-parallel training code at: <a class="link-external link-https" href="https://github.com/RWKV/RWKV-infctx-trainer" rel="external noopener nofollow">this https URL</a>
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?