Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching

Ali Taghibakhshi,Mingyuan Ma,Ashwath Aithal,Onur Yilmaz,Haggai Maron,Matthew West
2023-10-20
Abstract:Cross-device user matching is a critical problem in numerous domains, including advertising, recommender systems, and cybersecurity. It involves identifying and linking different devices belonging to the same person, utilizing sequence logs. Previous data mining techniques have struggled to address the long-range dependencies and higher-order connections between the logs. Recently, researchers have modeled this problem as a graph problem and proposed a two-tier graph contextual embedding (TGCE) neural network architecture, which outperforms previous methods. In this paper, we propose a novel hierarchical graph neural network architecture (HGNN), which has a more computationally efficient second level design than TGCE. Furthermore, we introduce a cross-attention (Cross-Att) mechanism in our model, which improves performance by 5% compared to the state-of-the-art TGCE method.
Machine Learning,Artificial Intelligence,Cryptography and Security,Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is Cross - Device User Matching. Specifically, this problem involves identifying and linking different devices belonging to the same person, mainly using sequence logs to achieve this goal. In many fields such as advertising, recommendation systems, and network security, this is a very crucial problem. ### Background and Challenges 1. **Background**: - As users conduct online activities on multiple devices, corporate brands often find it difficult to identify the same user through different device activities because the activities on each device are regarded as independent entities. - Automatically identifying the activities of the same user on multiple devices is very important for understanding human behavior patterns and can be applied in areas such as user profiling, online advertising, and improving system security. 2. **Challenges**: - Long - distance dependencies and high - order connections in sequence logs are difficult to capture. - Traditional data mining techniques perform poorly in dealing with these long - distance dependencies and high - order connections. - Privacy protection makes user identifiers usually unavailable, and only user browsing records (such as URL access logs) can be relied on. ### Existing Methods and Their Deficiencies 1. **Traditional Methods**: - Use manually - designed features (such as TF - IDF, URL access time features). - Use deep learning methods such as 2D Convolutional Neural Network (CNN), LSTM to encode sequence log representations. - These methods mainly focus on local interactions and are difficult to capture the entire sequence or higher - level patterns. 2. **Limitations of Existing Methods**: - **TGCE Method**: - Long - distance information is transmitted through a two - layer structure, but there are two main problems: - Random walks may randomly connect two URLs with a long time interval, resulting in inaccurate information. - In the final pairing classification task, the generated graph embeddings are sent to the fully - connected network after element - wise multiplication, which may lose key features. ### The Method Proposed in the Paper 1. **New Hierarchical Graph Neural Network (HGNN)**: - **Hierarchical Structure**: Consider URL nodes as detail points, assign a coarse node to every K consecutive detail points, and achieve effective long - distance information transmission through message passing. - **Cross - Attention Mechanism**: In the pairing classification task, use the cross - attention mechanism to perform element - wise cross - encoding on the learned embeddings, which improves the performance of the model. 2. **Main Contributions**: - Propose a hierarchical heterogeneous graph model, which is 6 times faster than the existing state - of - the - art methods while maintaining competitive accuracy and performance. - Introduce the cross - attention mechanism, which improves the overall method's accuracy by about 5%. ### Experimental Results - **Dataset**: Use the CIKM Cup 2016 competition dataset provided by Data Centric Alliance. - **Performance Comparison**: - In terms of the F1 score, the proposed "HGNN+Cross - Att" method improves by 5% compared to the existing state - of - the - art method (TGCE). - In terms of training time, the HGNN model is 6 times faster than TGCE, and the HGNN+Cross - Att model has the same training time as TGCE but better performance. ### Conclusion The paper proposes a new graph neural network architecture, which effectively solves the cross - device user matching problem through the hierarchical structure and cross - attention mechanism. The experimental results show that this method is not only superior to existing methods in performance but also has a significant improvement in training efficiency.