The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Allison Hegel,Marina Shah,Genevieve Peaslee,Brendan Roof,Emad Elwany
DOI: https://doi.org/10.48550/arXiv.2107.08128
2021-07-17
Abstract:Large, pre-trained transformer models like BERT have achieved state-of-the-art results on document understanding tasks, but most implementations can only consider 512 tokens at a time. For many real-world applications, documents can be much longer, and the segmentation strategies typically used on longer documents miss out on document structure and contextual information, hurting their results on downstream tasks. In our work on legal agreements, we find that visual cues such as layout, style, and placement of text in a document are strong features that are crucial to achieving an acceptable level of accuracy on long documents. We measure the impact of incorporating such visual cues, obtained via computer vision methods, on the accuracy of document understanding tasks including document segmentation, entity extraction, and attribute classification. Our method of segmenting documents based on structural metadata out-performs existing methods on four long-document understanding tasks as measured on the Contract Understanding Atticus Dataset.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the structural understanding problem in processing long documents, especially legal contracts. Existing methods based on large pre - trained Transformer models (such as BERT) encounter limitations when processing documents with more than 512 tokens. This results in the fact that when processing longer documents common in practical applications, the traditional segmentation strategies cannot capture the document's structure and context information, thus affecting the performance of downstream tasks. Specifically, the paper focuses on how to improve the understanding accuracy of long documents through visual cues (such as document layout, style, and text position, etc.), especially in tasks such as document segmentation, entity extraction, and attribute classification. The authors propose a method that combines Optical Character Recognition (OCR) metadata to better maintain document structure information, thereby outperforming existing methods in four key contract understanding tasks.