Data Quality for Deep Learning of Judgment Documents: an Empirical Study

Jiawei Liu,Dong Wang,Zhenzhen Wang,Zhenyu Chen
DOI: https://doi.org/10.1007/978-981-15-3412-6_5
2020-01-01
Abstract:The revolution in hardware technology has made it possible to obtain high-definition data through highly sophisticated algorithms. Deep learning has emerged and is widely used in various fields, and the judicial area is no exception. As the carrier of the litigation activities, the judgment documents record the process and results of the people's courts, and their quality directly affects the fairness and credibility of the law. To be able to measure the quality of judgment documents, the interpretability of judgment documents has been an indispensable dimension. Unfortunately, due to the various uncontrollable factors during the process, such as data transmission and storage, The data set for training usually has a poor quality. Besides, due to the severe imbalance of the distribution of case data, data augmentation is essential to generate data for low-frequency cases. Based on the existing data set and the application scenarios, we explore data quality issues in four areas. Then we systematically investigate them to figure out their impact on the data set. After that, we compare the four dimensions to find out which one has the most considerable damage to the data set.
What problem does this paper attempt to address?