An Empirical Study on Transfer Learning for Privilege Review

Haozhen Zhao,Shi Ye,Jingchao Yang

DOI: https://doi.org/10.48550/arXiv.2112.08606

2021-12-16

Abstract:Protecting privileged communications and data from inadvertent disclosure is a paramount task in the US legal practice. Traditionally counsels rely on keyword searching and manual review to identify privileged documents in cases. As data volumes increase, this approach becomes less and less defensible in costs. Machine learning methods have been used in identifying privilege documents. Given the generalizable nature of privilege in legal cases, we hypothesize that transfer learning can capitalize knowledge learned from existing labeled data to identify privilege documents without requiring labeling new training data. In this paper, we study both traditional machine learning models and deep learning models based on BERT for privilege document classification tasks in legal document review, and we examine the effectiveness of transfer learning in privilege model on three real world datasets with privilege labels. Our results show that BERT model outperforms the industry standard logistic regression algorithm and transfer learning models can achieve decent performance on datasets in same or close domains.

Information Retrieval

What problem does this paper attempt to address?

The paper attempts to address the issue of how to utilize transfer learning to improve the efficiency and accuracy of identifying privileged documents during the legal document review process. Traditionally, these documents are identified through keyword searches and manual review, but as the volume of data increases, this method becomes increasingly impractical in terms of cost. Machine learning methods have been used to identify privileged documents, but they typically require a large amount of labeled data. This paper hypothesizes that transfer learning can extract knowledge from existing labeled data and apply it to new datasets, thereby reducing the need for new training data. Specifically, the research objectives of the paper include: 1. Comparing the performance of traditional machine learning methods (such as logistic regression) with BERT-based deep learning models in identifying privileged documents. 2. Investigating the effectiveness of pre-trained machine learning models in predicting privileged documents in a zero-shot setting (i.e., the model is trained on a completely different dataset). Through these studies, the paper aims to explore the potential of transfer learning in legal document review, particularly in terms of reducing the need for labeled data and improving model generalization capabilities.

An Empirical Study on Transfer Learning for Privilege Review

An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters

CNN Application in Detection of Privileged Documents in Legal Document Review

How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review

Image Analytics for Legal Document Review: A Transfer Learning Approach

Empirical Study of LLM Fine-Tuning for Text Classification in Legal Document Review

Empirical Evaluations of Active Learning Strategies in Legal Document Review

Analysis of Privacy Leakage in Federated Large Language Models

Providing More Efficient Access To Government Records: A Use Case Involving Application of Machine Learning to Improve FOIA Review for the Deliberative Process Privilege

An Empirical Study on Cross-X Transfer for Legal Judgment Prediction

An In-Depth Evaluation of Federated Learning on Biomedical Natural Language Processing

Model-Based Differentially Private Knowledge Transfer for Large Language Models

Privacy-preserving Transfer Learning for Knowledge Sharing.

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale

Learning to Transfer Privileged Information

Transfer Learning for Security: Challenges and Future Directions

Can Language Models be Instructed to Protect Personal Information?

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval

Beyond Memorization: Violating Privacy Via Inference with Large Language Models