Abstract:We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4. The base model is available at <a class="link-external link-https" href="https://huggingface.co/pfnet/plamo-100b" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop a large - scale language model (PLaMo - 100B) specifically designed for Japanese proficiency. Specifically, the paper focuses on the following aspects: 1. **Improving Japanese processing ability**: Compared with existing large - language models, PLaMo - 100B is specifically optimized for Japanese tasks, aiming to improve performance in Japanese - specific tasks. 2. **Training from scratch**: Unlike many models that are fine - tuned based on the weights of existing models, PLaMo - 100B is trained from scratch, using a data set of 2 trillion tokens, of which 1.5 trillion are used for initial pre - training and 0.5 trillion are used for continuous pre - training. This ensures that the model can better adapt to the task requirements of Japanese and English. 3. **Introducing advanced training techniques**: In order to ensure the stability of the training process, a variety of advanced techniques, such as QK Normalization and Z - Loss, are introduced in the paper. These techniques help maintain the stability and performance of the model during large - scale training. 4. **Post - training optimization**: Through post - training techniques such as Supervised Fine - Tuning (SFT) and Direct Preference Optimization (DPO), the performance of the model is further improved. In particular, the paper describes in detail how to generate high - quality training data to enhance the performance of the model in various tasks. 5. **Evaluating model performance**: The paper comprehensively evaluates PLaMo - 100B through multiple benchmark tests (such as Jaster, Japanese MT - Bench, and Rakuda Benchmark). The results show that it is competitive in both Japanese and English tasks, especially performing excellently in Japanese tasks and even outperforming GPT - 4 on some benchmarks. In summary, the main objective of this paper is to develop a high - performance Japanese language model. By training from scratch and introducing a variety of advanced techniques and optimization methods, it is ensured that the model achieves the best performance in Japanese - specific tasks.

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

RakutenAI-7B: Extending Large Language Models for Japanese

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

CPM-2: Large-scale Cost-effective Pre-trained Language Models

PolyLM: An Open Source Polyglot Large Language Model

Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Sensitivity and Robustness of Large Language Models to Prompt in Japanese

Code Llama: Open Foundation Models for Code

LLaMA: Open and Efficient Foundation Language Models

MOSS: an Open Conversational Large Language Model

Building a Large Japanese Web Corpus for Large Language Models

MojoBench: Language Modeling and Benchmarks for Mojo

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation

Jamba: A Hybrid Transformer-Mamba Language Model

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

LLaMA Pro: Progressive LLaMA with Block Expansion