Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws

Kornraphop Kawintiranon,Yaguang Liu
DOI: https://doi.org/10.48550/arXiv.2105.10117
2021-05-21
Computation and Language
Abstract:General Data Protection Regulation (GDPR) becomes a standard law for data protection in many countries. Currently, twelve countries adopt the regulation and establish their GDPR-like regulation. However, to evaluate the differences and similarities of these GDPR-like regulations is time-consuming and needs a lot of manual effort from legal experts. Moreover, GDPR-like regulations from different countries are written in their languages leading to a more difficult task since legal experts who know both languages are essential. In this paper, we investigate a simple natural language processing (NLP) approach to tackle the problem. We first extract chunks of information from GDPR-like documents and form structured data from natural language. Next, we use NLP methods to compare documents to measure their similarity. Finally, we manually label a small set of data to evaluate our approach. The empirical result shows that the BERT model with cosine similarity outperforms other baselines. Our data and code are publicly available.
What problem does this paper attempt to address?