LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model

Amit Tewari

2024-04-16

Abstract:A contract is a type of legal document commonly used in organizations. Contract review is an integral and repetitive process to avoid business risk and liability. Contract analysis requires the identification and classification of key provisions and paragraphs within an agreement. Identification and validation of contract clauses can be a time-consuming and challenging task demanding the services of trained and expensive lawyers, paralegals or other legal assistants. Classification of legal provisions in contracts using artificial intelligence and natural language processing is complex due to the requirement of domain-specialized legal language for model training and the scarcity of sufficient labeled data in the legal domain. Using general-purpose models is not effective in this context due to the use of specialized legal vocabulary in contracts which may not be recognized by a general model. To address this problem, we propose the use of a pre-trained large language model which is subsequently calibrated on legal taxonomy. We propose LegalPro-BERT, a BERT transformer architecture model that we fine-tune to efficiently handle classification task for legal provisions. We conducted experiments to measure and compare metrics with current benchmark results. We found that LegalPro-BERT outperforms the previous benchmark used for comparison in this research.

Artificial Intelligence,Information Retrieval,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the automatic classification of legal clauses in contracts. Specifically, the author focuses on how to use pre - trained large - scale language models (such as BERT) through fine - tuning techniques to efficiently handle the classification tasks of legal provisions. This problem is very important in practical applications because contract review is a time - consuming and costly process, usually requiring the services of professional and expensive lawyers, paralegals or other legal assistants. Since specialized legal vocabulary is used in contract texts, this makes classification using general models complicated. Therefore, the paper proposes a solution - using a pre - trained language model and fine - tuning it with data in the legal field to improve the accuracy and efficiency of classification. The main contributions of the paper include: 1. Proposing LegalPro - BERT, a large - scale language model based on BERT - large for the classification tasks of legal provisions. 2. Using the LexGLUE LEDGAR classification dataset as a benchmark for comparison. 3. The experimental results show that LegalPro - BERT performs better than the existing benchmark methods on LEDGAR data. 4. The study also explores the impact of data pre - processing techniques on model accuracy, and the method of only fine - tuning some layers of the model to reduce the overall time and improve the prediction performance.

LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model

LEGAL-BERT: The Muppets straight out of Law School

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

Transformer-based Entity Legal Form Classification

Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law

Comparing the Performance of NLP Toolkits and Evaluation measures in Legal Tech

Empirical Study of LLM Fine-Tuning for Text Classification in Legal Document Review

Towards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder's Perspective

Lawma: The Power of Specialization for Legal Tasks

Large Language Models are legal but they are not: Making the case for a powerful LegalLLM

The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines

Fine-Tuning Large Language Models for Scientific Text Classification: A Comparative Study

Large Scale Legal Text Classification Using Transformer Models

Pre-training Transformers on Indian Legal Text

Understand Legal Documents with Contextualized Large Language Models

Legal Transformer Models May Not Always Help

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

Unsupervised Law Article Mining based on Deep Pre-Trained Language Representation Models with Application to the Italian Civil Code

AraLegal-BERT: A pretrained language model for Arabic Legal text

BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval