Release of Pre-Trained Models for the Japanese Language

Kei Sawada,Tianyu Zhao,Makoto Shing,Kentaro Mitsui,Akio Kaga,Yukiya Hono,Toshiaki Wakatsuki,Koh Mitsuda
2024-04-02
Abstract:AI democratization aims to create a world in which the average person can utilize AI techniques. To achieve this goal, numerous research institutes have attempted to make their results accessible to the public. In particular, large pre-trained models trained on large-scale data have shown unprecedented potential, and their release has had a significant impact. However, most of the released models specialize in the English language, and thus, AI democratization in non-English-speaking communities is lagging significantly. To reduce this gap in AI access, we released Generative Pre-trained Transformer (GPT), Contrastive Language and Image Pre-training (CLIP), Stable Diffusion, and Hidden-unit Bidirectional Encoder Representations from Transformers (HuBERT) pre-trained in Japanese. By providing these models, users can freely interface with AI that aligns with Japanese cultural values and ensures the identity of Japanese culture, thus enhancing the democratization of AI. Additionally, experiments showed that pre-trained models specialized for Japanese can efficiently achieve high performance in Japanese tasks.
Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **Promoting AI Democratization**: By releasing pre-trained models optimized for Japanese, it lowers the barriers for non-English speakers to access and utilize advanced AI technologies. 2. **Bridging the Language Gap**: Most existing high-performance pre-trained models focus on English, leading to significant lag in AI resource access for non-English communities. The paper addresses this issue by releasing a series of pre-trained models specifically for Japanese. 3. **Cultural Adaptability**: Ensuring that the released models reflect Japanese cultural values and maintain the characteristics of Japanese culture, thereby enhancing the inclusivity of AI democratization. Specifically, the paper releases the following types of Japanese pre-trained models: - **Language Model (GPT)**: A Japanese language model based on the Generative Pre-trained Transformer (GPT) architecture, used for text generation tasks. - **Language-Image Model (CLIP)**: A model that connects visual concepts with natural language, used for tasks such as zero-shot image classification. - **Stable Diffusion Model**: A model used to generate high-quality images based on text prompts. - **Speech Model (HuBERT)**: A self-supervised speech representation learning model used for automatic speech recognition tasks. Through experimental validation, these Japanese-specific pre-trained models have been shown to efficiently achieve high performance in handling Japanese-related tasks. Additionally, the stable diffusion model has demonstrated its ability to process Japanese inputs and produce outputs that align with Japanese cultural characteristics.