Building a Production-Ready Multi-Label Classifier for Legal Documents with Digital-Twin-Distiller

Gergely Márk Csányi,Renátó Vági,Dániel Nagy,István Üveges,János Pál Vadász,Andrea Megyeri,Tamás Orosz
DOI: https://doi.org/10.3390/app12031470
2022-01-29
Applied Sciences
Abstract:One of the most time-consuming parts of an attorney’s job is finding similar legal cases. Categorization of legal documents by their subject matter can significantly increase the discoverability of digitalized court decisions. This is a multi-label classification problem, where each relatively long text can fit into more than one legal category. The proposed paper shows a solution where this multi-label classification problem is decomposed into more than a hundred binary classification problems. Several approaches have been tested, including different machine-learning and text-augmentation techniques to produce a practically applicable model. The proposed models and the methodologies were encapsulated and deployed as a digital-twin into a production environment. The performance of the created machine learning-based application reaches and could also improve the human-experts performance on this monotonous and labor-intensive task. It could increase the e-discoverability of the documents by about 50%.
What problem does this paper attempt to address?