Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources

Issey Sukeda

2024-09-20

Abstract:The recent success of large language models (LLMs) and the scaling law has led to a widespread adoption of larger models. Particularly in the healthcare industry, there is an increasing demand for locally operated LLMs due to security concerns. However, the majority of high quality open-source LLMs have a size of 70B parameters, imposing significant financial burdens on users for GPU preparation and operation. To overcome these issues, we present a medical adaptation based on the recent 7B models, which enables the operation in low computational resources. We compare the performance on medical question-answering benchmarks in two languages (Japanese and English), demonstrating that its scores reach parity with or surpass those of currently existing medical LLMs that are ten times larger. We find that fine-tuning an English-centric base model on Japanese medical dataset improves the score in both language, supporting the effect of cross-lingual knowledge transfer. We hope that this study will alleviate financial challenges, serving as a stepping stone for clinical institutions to practically utilize LLMs locally. Our evaluation code is available at <a class="link-external link-https" href="https://github.com/stardust-coder/japanese-lm-med-harness" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop a large - language model (LLM) in the Japanese medical field that can operate efficiently and perform well under limited computing resources. Specifically, the paper focuses on the following aspects: 1. **Reducing computing costs**: Currently, most high - quality open - source large - language models have 70 billion parameters, which brings a huge financial burden to users, especially in terms of GPU preparation and operation. Therefore, researchers hope to use models with a smaller number of parameters (about 7 billion) to reduce the demand for computing resources. 2. **Improving the security of local deployment**: In the medical industry, security issues are particularly important due to the involvement of patients' personal privacy. Large - language models are usually only accessible through API services, which limits their practical application in the clinical environment. Researchers hope to improve data security and model customization by developing miniaturized local models. 3. **Achieving cross - language knowledge transfer**: Researchers hope to verify the effect of cross - language knowledge transfer by fine - tuning an English - centered base model on a Japanese medical data set, that is, to improve the performance of the model on Japanese medical tasks without sacrificing English performance. 4. **Evaluating model performance**: The paper verifies the effectiveness of the proposed method by comparing the performance of the model in medical question - answering benchmark tests in two languages (Japanese and English). Researchers hope to prove that, through appropriate fine - tuning, small - scale models can reach or exceed the performance of existing large - scale medical LLMs. In summary, this paper aims to explore how to develop a miniaturized large - language model in the Japanese medical field that can ensure performance and improve security under limited computing resources.

Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources

70B-parameter large language models in Japanese medical question-answering

JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models

JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing

PMC-LLaMA: toward building open-source language models for medicine

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis (Preprint)

Towards Evaluating and Building Versatile Large Language Models for Medicine

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

Technical Report: Small Language Model for Japanese Clinical and Medicine

Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Large language models encode clinical knowledge

Large Language Models in Medicine: The Potentials and Pitfalls

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

Evaluating large language models in medical applications: a survey

Large Language Model-Based Evaluation of Medical Question Answering Systems: Algorithm Development and Case Study

Model development for bespoke large language models for digital triage assistance in mental health care