Abstract:The web3 applications have recently been growing, especially on the Ethereum platform, starting to become the target of scammers. The web3 scams, imitating the services provided by legitimate platforms, mimic regular activity to deceive users. The current phishing account detection tools utilize graph learning or sampling algorithms to obtain graph features. However, large-scale transaction networks with temporal attributes conform to a power-law distribution, posing challenges in detecting web3 scams. In this paper, we present ScamSweeper, a novel framework to identify web3 scams on Ethereum. Furthermore, we collect a large-scale transaction dataset consisting of web3 scams, phishing, and normal accounts. Our experiments indicate that ScamSweeper exceeds the state-of-the-art in detecting web3 scams.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper aims to solve the problem of Web3 scam account detection in the Ethereum network. With the rapid growth of Web3 applications (especially those based on the Ethereum platform), scammers have also begun to target these platforms. Web3 scams deceive users by imitating the services provided by legitimate platforms and disguising themselves as normal activities, which brings great security risks to users.
Existing phishing account detection tools mainly rely on graph learning or sampling algorithms to obtain graph features. However, in large - scale transaction networks, transaction networks with time attributes follow a power - law distribution, which poses challenges to the detection of Web3 scams. Specifically:
1. **Traditional phishing attacks**: Off - chain, traditional phishing attacks are similar to Web3 scams in that they both steal funds by deceiving users to connect to false intermediaries.
2. **Web3 scams**: On - chain, Web3 scams disguise themselves as providing normal services, enabling attackers to quietly transfer tokens without the user's knowledge and cover their tracks. This behavior makes Web3 scams more difficult to detect and track.
To solve these problems, this paper proposes a new framework - ScamSweeper for identifying Web3 scam accounts on Ethereum. ScamSweeper constructs structure - and time - related transaction graphs through an improved random walk method and divides these graphs into multiple directed sub - graphs. Then, these sub - graphs are arranged in chronological order and input into a transposed Transformer to capture the dynamic evolution of the sub - graphs. Ultimately, ScamSweeper can effectively detect scam accounts.
### Formula presentation
Some of the key formulas and symbols involved in the paper are as follows:
- **Structured - time random walk (STRWalk)**:
- Given the current node \(v\), the set of its neighbor nodes is \(\{v_1, v_2,\dots, v_N\}\).
- The set of timestamps for each edge is \(T = \{t_1, t_2,\dots, t_N\}\).
- The formula for calculating the edge selection probability is:
\[
P(e_{ij})=\frac{\exp(\alpha\cdot t_{ij})}{\sum_{k = 1}^{N}\exp(\alpha\cdot t_{ik})}
\]
- Randomly select a neighbor node \(v_j\) according to the above probability.
- **Sub - graph sequence representation**:
- The sub - graph sequence \(G=\{G_0, G_1,\dots, G_T\}\) is sorted by time, where \(G_t\) represents the sub - graph of the \(t\)-th time interval.
- **Dynamic evolution learning**:
- Use GAT (Graph Attention Network) to extract sub - graph features:
\[
h_t = \text{GAT}(G_t)
\]
- Construct a time - sub - graph feature sequence \(\Phi=(h_0, h_1,\dots, h_T)\in\mathbb{R}^{T\times d}\), where \(d\) is the hidden layer size.
- **Transposed Transformer structure**:
- Input linear layers \(\Theta_q, \Theta_k, \Theta_v\in\mathbb{R}^{d\times d}\), generate query \(Q\), key \(K\) and value \(V\):
\[
Q = \Theta_q\cdot\Phi,\quad K=\Theta_k\cdot\Phi,\quad V = \Theta_v\cdot\Phi
\]
Through these methods, ScamSweeper can effectively identify Web3 scam accounts in large - scale transaction networks and significantly improve the detection performance.