LawLLM: Law Large Language Model for the US Legal System

Dong Shu,Haoran Zhao,Xukun Liu,David Demeter,Mengnan Du,Yongfeng Zhang
DOI: https://doi.org/10.1145/3627673.3680020
2024-07-28
Abstract:In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax, and historical context. Moreover, the subtle distinctions between similar and precedent cases require a deep understanding of legal knowledge. Researchers often conflate these concepts, making it difficult to develop specialized techniques to effectively address these nuanced tasks. In this paper, we introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain to address these challenges. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). By clearly distinguishing between precedent and similar cases, we provide essential clarity, guiding future research in developing specialized strategies for these tasks. We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format. Furthermore, we also use techniques such as in-context learning (ICL) and advanced information retrieval methods in LawLLM. The evaluation results demonstrate that LawLLM consistently outperforms existing baselines in both zero-shot and few-shot scenarios, offering unparalleled multi-task capabilities and filling critical gaps in the legal domain.
Computation and Language,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the challenge of finding relevant cases and accurately predicting judicial outcomes in the field of legal analysis. Specifically, the paper focuses on the following aspects: 1. **Complexity**: Legal language often contains technical terms, complex grammatical structures, and historical context, making it very difficult to understand and process legal texts. 2. **Nuances**: The subtle differences between similar cases and precedent cases require deep legal knowledge to distinguish, and existing research often confuses these concepts. 3. **Multi-tasking**: Existing models usually can only solve a single task, while legal practice requires handling multiple tasks simultaneously, such as Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). To address these issues, the paper introduces the Law Large Language Model (LawLLM), a multi-task model specifically designed for the U.S. legal domain. LawLLM excels in the following three aspects: 1. **Similar Case Retrieval (SCR)**: Finding the most similar cases to the input case from a large number of cases. 2. **Precedent Case Recommendation (PCR)**: Recommending precedent cases related to the input case. 3. **Legal Judgment Prediction (LJP)**: Predicting the possible judgment outcome of the input case. By clearly distinguishing between similar cases and precedent cases and proposing customized data preprocessing techniques, LawLLM provides important guidance for future research and fills a critical gap in the legal field. Experimental results show that LawLLM significantly outperforms existing baseline models in both zero-shot and few-shot scenarios.