ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing

Ha-Thanh Nguyen,Vu Tran,Phuong Minh Nguyen,Thi-Hai-Yen Vuong,Quan Minh Bui,Chau Minh Nguyen,Binh Tran Dang,Minh Le Nguyen,Ken Satoh
DOI: https://doi.org/10.48550/arXiv.2106.13403
2021-06-25
Computation and Language
Abstract:Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original sentence to produce the correct translation sentence. From that observation, in this paper, we propose ParaLaw Nets, a pretrained model family using sentence-level cross-lingual information to reduce ambiguity and increase the performance in legal text processing. This approach achieved the best result in the Question Answering task of COLIEE-2021.
What problem does this paper attempt to address?