Abstract:Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to black-box models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are: **the challenges of mining risk patterns (Risk Pattern Mining) in large - scale industrial property graphs**, especially in fraud detection applications in financial trading scenarios. Specifically, the paper proposes innovative solutions to the following two main problems: 1. **The challenges of handling large - scale graph data with properties**: - In many practical applications, a simple graph topological representation is not sufficient to accurately describe risk scenarios. To depict entities more meticulously, high - dimensional properties of nodes or edges need to be utilized. - Most of the existing methods can only handle one - dimensional properties or are unable to effectively handle graphs with properties. 2. **The problem of insufficient scalability**: - Graph data in industrial environments is usually very large, containing millions or even billions of nodes. Existing graph pattern mining methods lack effective computational optimization strategies and are difficult to cope with such large - scale data. - This deficiency in capabilities significantly limits the applicability of these methods in industrial tasks, as industrial tasks require strong data processing and analysis capabilities to deal with data volume and complexity. To solve these problems, the paper introduces the GraphRPM framework, which has the following key features: - **Edge - Involved Graph Isomorphism Network (EGIN)**: This is a new graph isomorphism network specifically designed to handle the problem of fuzzy matching of graph patterns with high - dimensional properties, achieving a balance between computational complexity and accuracy. - **Two - stage mining strategy**: Combined with a parallel - distributed processing framework, it reduces computational redundancy and improves efficiency. In the first stage, only node features are used for pattern mapping, and in the second stage, edge features are introduced for pattern merging, ultimately obtaining risk patterns with significant discrimination. - **Pattern Risk Score**: An evaluation metric is proposed to identify important risk patterns. By calculating the precision and recall of the pattern and synthesizing these two metrics, the pattern risk score (Rs) is obtained, thereby quantifying the reliability and relevance of the pattern in identifying financial risks. In conclusion, GraphRPM aims to provide a robust and efficient methodological framework that can mine discriminative graph patterns from large - scale industrial property graphs, especially those risk patterns related to fraudulent behavior. This not only helps in financial fraud detection but can also be applied to other industrial and commercial analysis fields.

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

Efficient Algorithms for Summarizing Graph Patterns

A General and Parallel Platform for Mining Co-Movement Patterns over Large-scale Trajectories.

GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination

Integrating Text Mining and Analytic Hierarchy Process Risk Assessment with Knowledge Graphs for Operational Risk Analysis

Near-optimal Top-k Pattern Mining

Scaling Hop-Based Reachability Indexing for Fast Graph Pattern Query Processing

JPMiner: Mining Frequent Jump Patterns from Graph Databases.

Accurate and Fast Approximate Graph Pattern Mining at Scale

ASAP: Fast, Approximate Graph Pattern Mining at Scale.

Towards Fast and Scalable Graph Pattern Mining.

Multi-temporal heterogeneous graph learning with pattern-aware attention for industrial chain risk detection

Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

Mining Discriminative Subgraph Patterns from Structural Data

A cost-effective approach for mining near-optimal top- k patterns

A Graph-Powered Large-Scale Fraud Detection System

Extract Frequent Pattern from Simple Graph Data.

RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s

GraphMiner

Peregrine: A Pattern-Aware Graph Mining System

XMiner: Efficient Directed Subgraph Matching with Pattern Reduction