Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics

Youssef Elmougy,Ling Liu
DOI: https://doi.org/10.1145/3580305.3599803
2023-05-26
Abstract:Blockchain provides the unique and accountable channel for financial forensics by mining its open and immutable transaction data. A recent surge has been witnessed by training machine learning models with cryptocurrency transaction data for anomaly detection, such as money laundering and other fraudulent activities. This paper presents a holistic applied data science approach to fraud detection in the Bitcoin network with two original contributions. First, we contribute the Elliptic++ dataset, which extends the Elliptic transaction dataset to include over 822k Bitcoin wallet addresses (nodes), each with 56 features, and 1.27M temporal interactions. This enables both the detection of fraudulent transactions and the detection of illicit addresses (actors) in the Bitcoin network by leveraging four types of graph data: (i) the transaction-to-transaction graph, representing the money flow in the Bitcoin network, (ii) the address-to-address interaction graph, capturing the types of transaction flows between Bitcoin addresses, (iii) the address-transaction graph, representing the bi-directional money flow between addresses and transactions (BTC flow from input address to one or more transactions and BTC flow from a transaction to one or more output addresses), and (iv) the user entity graph, capturing clusters of Bitcoin addresses representing unique Bitcoin users. Second, we perform fraud detection tasks on all four graphs by using diverse machine learning algorithms. We show that adding enhanced features from the address-to-address and the address-transaction graphs not only assists in effectively detecting both illicit transactions and illicit addresses, but also assists in gaining in-depth understanding of the root cause of money laundering vulnerabilities in cryptocurrency transactions and the strategies for fraud detection and prevention. Released at <a class="link-external link-http" href="http://github.com/git-disl/EllipticPlusPlus" rel="external noopener nofollow">this http URL</a>.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the detection of fraudulent transactions and illegal nodes in the Bitcoin network. Specifically: 1. **Improving the accuracy and interpretability of fraud detection**: The paper proposes a comprehensive data - science approach, aiming to detect fraudulent transactions and illegal addresses (participants) in the Bitcoin network through machine - learning models. In particular, this method not only improves the detection accuracy but also provides in - depth explanations of the detection results, revealing the root causes of illegal activities such as money laundering. 2. **Expanding the dataset to enhance detection capabilities**: The paper contributes a new dataset named Elliptic++, which extends the existing Elliptic dataset. The new dataset includes more than 822,000 Bitcoin wallet addresses (nodes), each with 56 features, and has 1.27 million time - interaction records. This enables more accurate detection of fraudulent transactions and illegal addresses. 3. **Using multiple graph structures for analysis**: The paper uses four different types of graph data to represent and analyze the relationships between transactions and addresses in the Bitcoin network: - **Transaction - to - Transaction Graph**: Represents the flow of funds in the Bitcoin network. - **Address - to - Address Interaction Graph**: Captures the types of transactions between Bitcoin addresses. - **Address - Transaction Graph**: Represents the two - way flow of funds between addresses and transactions. - **User Entity Graph**: Captures address clusters representing unique Bitcoin users. 4. **Evaluating different machine - learning algorithms**: The paper applies multiple machine - learning algorithms on the above four graph structures, including Random Forest, Multilayer Perceptron, Long Short - Term Memory, Extreme Gradient Boosting, etc., to evaluate their performance in the fraud - detection task. Through these methods, the paper aims to provide a powerful tool for financial forensics to identify and prevent fraud in the Bitcoin network.