Ethereum Fraud Detection via Joint Transaction Language Model and Graph Representation Learning

Yifan Jia,Yanbin Wang,Jianguo Sun,Yiwei Liu,Zhang Sheng,Ye Tian
2024-09-09
Abstract:Ethereum faces growing fraud threats. Current fraud detection methods, whether employing graph neural networks or sequence models, fail to consider the semantic information and similarity patterns within transactions. Moreover, these approaches do not leverage the potential synergistic benefits of combining both types of models. To address these challenges, we propose TLMG4Eth that combines a transaction language model with graph-based methods to capture semantic, similarity, and structural features of transaction data in Ethereum. We first propose a transaction language model that converts numerical transaction data into meaningful transaction sentences, enabling the model to learn explicit transaction semantics. Then, we propose a transaction attribute similarity graph to learn transaction similarity information, enabling us to capture intuitive insights into transaction anomalies. Additionally, we construct an account interaction graph to capture the structural information of the account transaction network. We employ a deep multi-head attention network to fuse transaction semantic and similarity embeddings, and ultimately propose a joint training approach for the multi-head attention network and the account interaction graph to obtain the synergistic benefits of both.
Cryptography and Security,Machine Learning,General Finance
What problem does this paper attempt to address?
This paper attempts to address the increasingly serious problem of fraud threats faced by Ethereum. Current fraud detection methods, whether using Graph Neural Networks (GNNs) or sequence models, have failed to fully consider the semantic information and similarity patterns in transaction data, and have not fully exploited the potential synergy of combining these two models. Specifically, the paper points out that the current methods have the following three main problems: 1. **Lack of Transaction Semantic Information**: Existing methods rely on the numerical form of transaction data and lack an interpretation of the underlying intent, making it difficult for the model to understand the specific meaning of the transactions. 2. **Insufficient Modeling of Transaction Similarity**: Extracting similarity information from transaction attributes (such as amount, direction, and time) is crucial for distinguishing between normal and abnormal transactions, but previous research has overlooked this. 3. **Insufficient Synergy Optimization**: Although some studies have attempted to combine GNNs with sequence models, they usually adopt a late - fusion method, that is, training these models separately and splicing features in the final stage, which fails to fully utilize the synergy between the two methods. To solve these problems, the authors propose TLMG4Eth, a method that combines Transaction Language Model (TLM) and graph representation learning, aiming to capture the semantic, similarity, and structural features in Ethereum transaction data. Specifically: - **Transaction Language Model (TLM)**: Converts numerical transaction data into meaningful transaction sentences, enabling the model to learn explicit transaction semantics. - **Transaction Attribute Similarity Graph (TASG)**: Captures intuitive insights into transaction anomalies by modeling the global semantic similarity of transaction attributes. - **Account Interaction Graph (AIG)**: Models the transaction behavior between accounts and captures the structural information of the transaction network. The transaction semantics and similarity embeddings are fused through a deep multi - head attention network, and a joint training method is proposed to utilize the synergy between the multi - head attention network and the account interaction graph. ### Main Contributions - Proposed a transaction language model that converts numerical transaction sequences into transaction sentences, thereby expressing the transaction content and learning explicit transaction semantics. - Constructed a transaction attribute similarity graph for modeling the global semantic similarity between transactions and capturing intuitive insights into transaction anomalies. - Used a multi - head attention network to fuse transaction semantic and similarity information and proposed a joint training method to obtain the synergy between them. - Significantly outperforms existing methods on three datasets, with an improvement in F1 - score of 10% - 20%. - Released a new dataset. Through these innovations, TLMG4Eth aims to improve the accuracy and robustness of Ethereum fraud detection.