LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model

Zhi Zhou,Jiang-Xin Shi,Peng-Xiao Song,Xiao-Wen Yang,Yi-Xuan Jin,Lan-Zhe Guo,Yu-Feng Li

2024-06-07

Abstract:Large language models (LLMs), including both proprietary and open-source models, have showcased remarkable capabilities in addressing a wide range of downstream tasks. Nonetheless, when it comes to practical Chinese legal tasks, these models fail to meet the actual requirements. Proprietary models do not ensure data privacy for sensitive legal cases, while open-source models demonstrate unsatisfactory performance due to their lack of legal knowledge. To address this problem, we introduce LawGPT, the first open-source model specifically designed for Chinese legal applications. LawGPT comprises two key components: legal-oriented pre-training and legal supervised fine-tuning. Specifically, we employ large-scale Chinese legal documents for legal-oriented pre-training to incorporate legal domain knowledge. To further improve the model's performance on downstream legal tasks, we create a knowledge-driven instruction dataset for legal supervised fine-tuning. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at <a class="link-external link-https" href="https://github.com/pengxiao-song/LaWGPT" rel="external noopener nofollow">this https URL</a> and have received 5.7K stars on GitHub.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper attempts to address two major obstacles in the application of existing large language models (LLMs) in Chinese legal tasks: 1. **Lack of legal domain knowledge**: Existing open-source general large language models perform poorly in handling legal tasks due to insufficient legal domain knowledge. 2. **Insufficient training on downstream legal tasks**: These models are inadequately trained on specific downstream legal tasks, resulting in suboptimal performance in practical legal applications. To address these issues, the paper introduces LAWGPT, the first open-source large language model specifically designed for Chinese legal applications. LAWGPT enhances the model's performance in legal tasks through Legal-Oriented Pre-Training (LPT) and Legal Supervised Fine-Tuning (LFT). Specifically: - **Legal-Oriented Pre-Training**: Pre-training with a large-scale corpus of Chinese legal documents to incorporate legal domain knowledge. - **Legal Supervised Fine-Tuning**: Creating a knowledge-driven instruction dataset to further improve the model's performance in downstream legal tasks. Experimental results show that LAWGPT outperforms the open-source LLaMA 7B model in multiple legal tasks, demonstrating its effectiveness and potential in the Chinese legal domain.

LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

InternLM-Law: An Open Source Chinese Legal Large Language Model

LawBench: Benchmarking Legal Knowledge of Large Language Models

Exploring New Frontiers of Deep Learning in Legal Practice: A Case Study of Large Language Models

Fine-tuning and Application of Large Language Model in Law Domain

A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction

Legal Evalutions and Challenges of Large Language Models

LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

Large Language Models are legal but they are not: Making the case for a powerful LegalLLM

LawLLM: Law Large Language Model for the US Legal System

LexGPT 0.1: pre-trained GPT-J models with Pile of Law

Large language models as tax attorneys: a case study in legal capabilities emergence

Lawyer LLaMA Technical Report

PolicyGPT: Automated Analysis of Privacy Policies with Large Language Models

Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model

Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

LAiW: A Chinese Legal Large Language Models Benchmark

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

LegalReasoner: A Multi-Stage Framework for Legal Judgment Prediction via Large Language Models and Knowledge Integration

Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study of Chinese Criminal Law