Demographic Information Inference Through Meta-Data Analysis of Wi-Fi Traffic

Huaxin Li,Zheyu Xu,Haojin Zhu,Di Ma,Shuai Li,Kai Xing
DOI: https://doi.org/10.1109/tmc.2017.2753244
IF: 6.075
2017-01-01
IEEE Transactions on Mobile Computing
Abstract:Privacy inference through meta-data (e.g., IP, Host) analysis of Wi-Fi traffic poses a potentially more serious threat to user privacy. First, it provides a more efficient and scalable approach to infer users' sensitive information without checking the content of Wi-Fi traffic. Second, meta-data based demographics inference can work on both unencrypted and encrypted traffic (e.g., HTTPS traffic). In this study, we present a novel approach to infer user demographic information by exploiting the meta-data of Wi-Fi traffic. We develop an inference framework based on machine learning and evaluate its performance on a real-world dataset, which includes the Wi-Fi access of 28,158 users in five months. The framework extracts four kinds of features from real-world Wi-Fi traffic and applies a novel machine learning technique (XGBoost) to predict user demographics. Our analytical results show that, the overall accuracy of inferring gender and education level of users can be 82 and 78 percent, respectively. It is surprising to show that, even for HTTPS traffic, user demographics can still be predicted at accuracy of 69 and 76 percent, respectively, which well demonstrates the practicality of the proposed privacy inference scheme. Finally, we discuss and evaluate potential mitigation methods for such inference attacks.
What problem does this paper attempt to address?