An MLM Decoding Space Enhancement for Legal Document Proofreading

Jinlong Liu,Xudong Luo
DOI: https://doi.org/10.1007/978-981-97-5492-2_5
2024-01-01
Abstract:Legal documents demand high precision and accuracy in language use, leaving no room for spelling, redundancy, miss, and disorder errors. To address this issue, this paper innovatively expands the decoding space of Masked Language Model (MLM) by introducing "insertion" and "deletion" editing labels, transforming MLM from handling only fixed-length spelling errors to flexibly addressing variable-length grammatical errors. Meanwhile, to tackle the data sparsity issue, we designed and implemented a rule-based data augmentation strategy. Our experiments show that our model outperforms state-of-the-art baselines on a dataset of annotated legal documents, showing its potential as a valuable tool in legal document preparation and revision processes.
What problem does this paper attempt to address?