LaVy: Vietnamese Multimodal Large Language Model

Chi Tran,Huong Le Thanh
2024-04-17
Abstract:Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. Our project is public at
Computation and Language,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of the lack of high-quality resources in the development of Vietnamese Multimodal Large Language Models (MLLM). Specifically, the contributions of the paper are as follows: 1. **Proposing the LaVy Model**: This is the first multimodal large language model designed specifically for Vietnamese, capable of achieving state-of-the-art performance in visual language tasks. LaVy significantly enhances performance in various multimodal tasks by integrating rich visual and linguistic information. 2. **Constructing the LaVy-Bench Benchmark Dataset**: To evaluate the performance of multimodal language models in Vietnamese visual understanding tasks, the researchers developed the LaVy-Bench benchmark dataset. This benchmark includes zero-shot visual question answering (VQA) and real-world scene test sets, providing a comprehensive assessment of the model's visual language understanding and generation capabilities. By introducing LaVy and its accompanying benchmark dataset, the paper aims to bridge the gap between Vietnamese unimodal language models and multimodal language models, providing powerful tools for researchers and practitioners in the field, and advancing the research progress in Vietnamese multimodal language understanding.