A Systematic Method on PDF Privacy Leakage Issues

Yun Feng,Baoxu Liu,Xiang Cui,Chaoge Liu,Xuebin Kang,Junwei Su
DOI: https://doi.org/10.1109/trustcom/bigdatase.2018.00144
2018-08-01
Abstract:PDF is extensively employed worldwide in the current time. A vast number of PDF are disseminated over the Internet during people's exchange of documents. The private information that is hidden in PDF document structure is revealed with documents, which causes privacy leakage. To systematically analyze and address the issues, we conduct a series of studies. We find possible sources of PDF personal privacy leaks and design a methodology to extract and recognize sensitive information automatically. Our methodology is helpful for users to check whether their PDF documents contain privacy information prior to transmission via the Internet. We conduct an experiment, and the results indicate the effectiveness of our method. We then experiment tens of thousands of benign and malicious PDF documents gathered from multiple sources around the world to analyze the current privacy leakage situation of PDF documents. Our analysis demonstrates that nearly 70% of people lack the awareness of privacy protection when employing PDF documents. We also discuss the special usage of our method in cyberattack attribution.
What problem does this paper attempt to address?