GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

Sheng Tian,Xintan Zeng,Yifei Hu,Baokun Wang,Yongchao Liu,Yue Jin,Changhua Meng,Chuntao Hong,Tianyi Zhang,Weiqiang Wang
DOI: https://doi.org/10.1007/978-3-031-70381-2_9
2024-11-11
Abstract:Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to black-box models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing,Social and Information Networks
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are: **the challenges of mining risk patterns (Risk Pattern Mining) in large - scale industrial property graphs**, especially in fraud detection applications in financial trading scenarios. Specifically, the paper proposes innovative solutions to the following two main problems: 1. **The challenges of handling large - scale graph data with properties**: - In many practical applications, a simple graph topological representation is not sufficient to accurately describe risk scenarios. To depict entities more meticulously, high - dimensional properties of nodes or edges need to be utilized. - Most of the existing methods can only handle one - dimensional properties or are unable to effectively handle graphs with properties. 2. **The problem of insufficient scalability**: - Graph data in industrial environments is usually very large, containing millions or even billions of nodes. Existing graph pattern mining methods lack effective computational optimization strategies and are difficult to cope with such large - scale data. - This deficiency in capabilities significantly limits the applicability of these methods in industrial tasks, as industrial tasks require strong data processing and analysis capabilities to deal with data volume and complexity. To solve these problems, the paper introduces the GraphRPM framework, which has the following key features: - **Edge - Involved Graph Isomorphism Network (EGIN)**: This is a new graph isomorphism network specifically designed to handle the problem of fuzzy matching of graph patterns with high - dimensional properties, achieving a balance between computational complexity and accuracy. - **Two - stage mining strategy**: Combined with a parallel - distributed processing framework, it reduces computational redundancy and improves efficiency. In the first stage, only node features are used for pattern mapping, and in the second stage, edge features are introduced for pattern merging, ultimately obtaining risk patterns with significant discrimination. - **Pattern Risk Score**: An evaluation metric is proposed to identify important risk patterns. By calculating the precision and recall of the pattern and synthesizing these two metrics, the pattern risk score (Rs) is obtained, thereby quantifying the reliability and relevance of the pattern in identifying financial risks. In conclusion, GraphRPM aims to provide a robust and efficient methodological framework that can mine discriminative graph patterns from large - scale industrial property graphs, especially those risk patterns related to fraudulent behavior. This not only helps in financial fraud detection but can also be applied to other industrial and commercial analysis fields.