User Behavior Fingerprinting With Multi-Item-Sets and Its Application in IPTV Viewer Identification

Can Yang,Lan Wang,Houwei Cao,Qihu Yuan,Yong Liu
DOI: https://doi.org/10.1109/tifs.2021.3055638
IF: 7.231
2021-01-01
IEEE Transactions on Information Forensics and Security
Abstract:User activities in cyberspace leave unique traces for user identification (UI). Individual users can be identified by their frequent activity items through statistical feature matching. However, such approaches face the data sparsity problem. In this paper, we propose to address this problem by multi-item-set fingerprinting that identifies users not only based on their frequent individual activity items, but also their frequent consecutive item sequences with different lengths. We also propose a new similarity metric between fingerprint vectors that combines the advantages of Jaccard distance and relative entropy distance. Furthermore, we develop a fusion decision scheme by consolidating matching candidates generated by different similarity metrics. It improves the precision at the price of extra rejection. Our proposed approaches can be used in both one-by-one matching and bipartite graph group matching. Through extensive experiments on three real user datasets, in particular a large-scale Internet Protocol Television (IPTV) viewer dataset, we demonstrate that the proposed approaches outperform the state-of-the-art methods. The average matching precision reaches 93.8% for a dataset of 1,000 users and 100% for a dataset of 100 users. This work is of significance for information forensics and raises a new challenge for human privacy protection in cyberspace.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?