H2O-Danube3 Technical Report

Pascal Pfeiffer,Philipp Singer,Yauhen Babakhin,Gabor Fodor,Nischay Dhankhar,Sri Satish Ambati
2024-07-12
Abstract:We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper introduces the H2O-Danube3 series of small-scale language models, including H2O-Danube3-4B and H2O-Danube3-500M, which are trained on 6T and 4T English data. These models undergo multi-stage training with a focus on high-quality network data and are finally supervised fine-tuned for chat versions. The models demonstrate high competitiveness in various academic, chat, and fine-tuning benchmarks, and due to their compact architecture, they can efficiently run on modern smartphones, supporting on-device inference and fast processing capabilities. The research extends previous work on small-scale language models, with a focus on efficient inference and edge device applications. These small models, after task-specific fine-tuning, even outperform certain BERT-based encoder-decoder models in tasks such as sequence classification, question-answering, and token classification. The paper provides a detailed description of the model architecture, training process, and fine-tuning steps, along with extensive evaluations covering standard academic metrics, chat benchmarks, and fine-tuning benchmarks. The results demonstrate that H2O-Danube3 performs strongly across various dimensions, expanding the range of choices for open-source small-scale language models. In addition, the paper introduces the iOS application H2O AI Personal GPT1, which allows users to run H2O-Danube3 offline on their mobile phones. The model is also quantized to reduce size while maintaining performance, making it suitable for resource-constrained devices. Through these efforts, the paper aims to further popularize language models, economically serving a wider audience and playing a role in various scenarios such as chatbots, task-specific applications, research, and offline device applications.