LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model

Amit Tewari
2024-04-16
Abstract:A contract is a type of legal document commonly used in organizations. Contract review is an integral and repetitive process to avoid business risk and liability. Contract analysis requires the identification and classification of key provisions and paragraphs within an agreement. Identification and validation of contract clauses can be a time-consuming and challenging task demanding the services of trained and expensive lawyers, paralegals or other legal assistants. Classification of legal provisions in contracts using artificial intelligence and natural language processing is complex due to the requirement of domain-specialized legal language for model training and the scarcity of sufficient labeled data in the legal domain. Using general-purpose models is not effective in this context due to the use of specialized legal vocabulary in contracts which may not be recognized by a general model. To address this problem, we propose the use of a pre-trained large language model which is subsequently calibrated on legal taxonomy. We propose LegalPro-BERT, a BERT transformer architecture model that we fine-tune to efficiently handle classification task for legal provisions. We conducted experiments to measure and compare metrics with current benchmark results. We found that LegalPro-BERT outperforms the previous benchmark used for comparison in this research.
Artificial Intelligence,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic classification of legal clauses in contracts. Specifically, the author focuses on how to use pre - trained large - scale language models (such as BERT) through fine - tuning techniques to efficiently handle the classification tasks of legal provisions. This problem is very important in practical applications because contract review is a time - consuming and costly process, usually requiring the services of professional and expensive lawyers, paralegals or other legal assistants. Since specialized legal vocabulary is used in contract texts, this makes classification using general models complicated. Therefore, the paper proposes a solution - using a pre - trained language model and fine - tuning it with data in the legal field to improve the accuracy and efficiency of classification. The main contributions of the paper include: 1. Proposing LegalPro - BERT, a large - scale language model based on BERT - large for the classification tasks of legal provisions. 2. Using the LexGLUE LEDGAR classification dataset as a benchmark for comparison. 3. The experimental results show that LegalPro - BERT performs better than the existing benchmark methods on LEDGAR data. 4. The study also explores the impact of data pre - processing techniques on model accuracy, and the method of only fine - tuning some layers of the model to reduce the overall time and improve the prediction performance.