Fingerprinting Search Keywords over HTTPS at Scale

Junhua Yan,Hasan Faik Alan,Jasleen Kaur
DOI: https://doi.org/10.48550/arXiv.2008.08161
2020-08-18
Cryptography and Security
Abstract:The possibility of fingerprinting the search keywords issued by a user on popular web search engines is a significant threat to user privacy. This threat has received surprisingly little attention in the network traffic analysis literature. In this work, we consider the problem of keyword fingerprinting of HTTPS traffic -- we study the impact of several factors, including client platform diversity, choice of search engine, feature sets as well as classification frameworks. We conduct both closed-world and open-world evaluations using nearly 4 million search queries collected over a period of three months. Our analysis reveals several insights into the threat of keyword fingerprinting in modern HTTPS traffic.
What problem does this paper attempt to address?