RiskSEA : A Scalable Graph Embedding for Detecting On-chain Fraudulent Activities on the Ethereum Blockchain

Ayush Agarwal,Lv Lu,Arjun Maheswaran,Varsha Mahadevan,Bhaskar Krishnamachari
2024-10-03
Abstract:Like any other useful technology, cryptocurrencies are sometimes used for criminal activities. While transactions are recorded on the blockchain, there exists a need for a more rapid and scalable method to detect addresses associated with fraudulent activities. We present RiskSEA, a scalable risk scoring system capable of effectively handling the dynamic nature of large-scale blockchain transaction graphs. The risk scoring system, which we implement for Ethereum, consists of 1. a scalable approach to generating node2vec embedding for entire set of addresses to capture the graph topology 2. transaction-based features to capture the transactional behavioral pattern of an address 3. a classifier model to generate risk score for addresses that combines the node2vec embedding and behavioral features. Efficiently generating node2vec embedding for large scale and dynamically evolving blockchain transaction graphs is challenging, we present two novel approaches for generating node2vec embeddings and effectively scaling it to the entire set of blockchain addresses: 1. node2vec embedding propagation and 2. dynamic node2vec embedding. We present a comprehensive analysis of the proposed approaches. Our experiments show that combining both behavioral and node2vec features boosts the classification performance significantly, and that the dynamic node2vec embeddings perform better than the node2vec propagated embeddings.
Cryptography and Security,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of detecting addresses related to fraud activities on the Ethereum blockchain. Specifically, the authors propose an extensible risk - scoring system named **RiskSEA** for efficiently detecting and predicting fraud behavior of Ethereum blockchain addresses. The following are the main problems and background of this research: 1. **Pseudo - anonymity of cryptocurrencies and lack of identity binding**: - The pseudo - anonymity of the blockchain enables criminals to carry out cybercrimes. In the traditional banking system, all transactions are carried out in privately - controlled internal ledgers, and customers must obtain direct approval from financial institutions to open accounts and conduct transactions; while in the blockchain, users can conduct transactions without third - party intermediaries, which increases the difficulty of supervision. 2. **Dynamic characteristics of large - scale blockchain transaction graphs**: - With the widespread use of cryptocurrency transactions, the number of transactions and wallets keeps increasing, making detection time - consuming and difficult to scale. Therefore, a faster and more extensible method is needed to detect addresses related to fraud activities. 3. **Limitations of existing methods**: - Previous studies have mainly focused on modeling fraud transaction patterns based on features such as transaction timestamps and amounts, but these methods have not fully explored the application of graph embeddings in fraud detection, especially in dealing with large - scale and dynamically evolving blockchain transaction graphs. ### Specific objectives of the paper - **Develop an extensible risk - scoring system**: This system can generate risk scores for each address on the Ethereum blockchain, thereby predicting the possibility of its participation in fraud activities. - **Combine behavioral features and graph - embedding features**: By combining transaction behavioral features (such as transaction amounts, timestamps, etc.) and graph - embedding features (such as the connection relationships between nodes), improve the classification performance. - **Solve the challenges of large - scale graph - embedding generation**: Propose two novel methods - **node2vec embedding propagation** and **dynamic node2vec embedding** - to deal with the computational challenges brought by large - scale and dynamically changing blockchain transaction graphs. ### Main contributions 1. **RiskSEA risk - scoring system**: A new risk - scoring system is designed, which can efficiently generate standardized risk scores for Ethereum blockchain addresses. 2. **Comprehensive feature set**: The combination of behavioral features and graph - embedding features improves the classification accuracy. 3. **Verify the advantages of graph - embedding features**: Through ablation experiments, it is shown that graph - embedding features are more effective than single behavioral features, and the combination of the two is the best. 4. **Extensible methods**: Two methods for generating node2vec embeddings are proposed, solving the extensibility problem of large - scale blockchain transaction graphs. 5. **Evaluation and verification**: The advantages of the dynamic node2vec method in dealing with dynamic transaction graphs are demonstrated, and a comprehensive performance evaluation is carried out. ### Conclusion By introducing the RiskSEA system, this paper fills the gap in the existing research on large - scale blockchain transaction graph - embedding generation and provides an effective and extensible fraud - detection solution.