Abstract:We present the first comprehensive empirical evaluation of pre-trained language models (PLMs) for legal natural language processing (NLP) in order to examine their effectiveness in this domain. Our study covers eight representative and challenging legal datasets, ranging from 900 to 57K samples, across five NLP tasks: binary classification, multi-label classification, multiple choice question answering, summarization and information retrieval. We first run unsupervised, classical machine learning and/or non-PLM based deep learning methods on these datasets, and show that baseline systems' performance can be 4%~35% lower than that of PLM-based methods. Next, we compare general-domain PLMs and those specifically pre-trained for the legal domain, and find that domain-specific PLMs demonstrate 1%~5% higher performance than general-domain models, but only when the datasets are extremely close to the pre-training corpora. Finally, we evaluate six general-domain state-of-the-art systems, and show that they have limited generalizability to legal data, with performance gains from 0.1% to 1.2% over other PLM-based methods. Our experiments suggest that both general-domain and domain-specific PLM-based methods generally achieve better results than simpler methods on most tasks, with the exception of the retrieval task, where the best-performing baseline outperformed all PLM-based methods by at least 5%. Our findings can help legal NLP practitioners choose the appropriate methods for different tasks, and also shed light on potential future directions for legal NLP research.

Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work

A Study of Pre-trained Language Models in Natural Language Processing

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Pre-Trained Language Models and Their Applications

Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings

A Survey on Knowledge-Enhanced Pre-trained Language Models

How Does Pretraining Improve Discourse-Aware Translation?

Revisiting K-Nn for Fine-Tuning Pre-trained Language Models

LERT: A Linguistically-motivated Pre-trained Language Model

Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning

Pre-train, Prompt and Recommendation: A Comprehensive Survey of Language Modelling Paradigm Adaptations in Recommender Systems

Impossible Triangle: What's Next for Pre-trained Language Models?

Pre-trained models for natural language processing: A survey

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

A Survey of Knowledge Enhanced Pre-trained Language Models

Pretrained Language Models for Text Generation: A Survey

Pre-Training a Language Model Without Human Language

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

In-context Pretraining: Language Modeling Beyond Document Boundaries

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models