ORBITAAL: A Temporal Graph Dataset of Bitcoin Entity-Entity Transactions

Célestin Coquidé,Rémy Cazabet
2024-08-26
Abstract:Research on Bitcoin (BTC) transactions is a matter of interest for both economic and network science fields. Although this cryptocurrency is based on a decentralized system, making transaction details freely accessible, making raw blockchain data analyzable is not straightforward due to the Bitcoin protocol specificity and data richness. To address the need for an accessible dataset, we present ORBITAAL, the first comprehensive dataset based on temporal graph formalism. The dataset covers all Bitcoin transactions from January 2009 to January 2021. ORBITAAL provides temporal graph representations of entity-entity transaction networks, snapshots and stream graph. Each transaction value is given in Bitcoin and US dollar regarding daily-based conversion rate. This dataset also provides details on entities such as their global BTC balance and associated public addresses.
Social and Information Networks,Cryptography and Security,Discrete Mathematics,Dynamical Systems,Physics and Society
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of accessibility and usability of Bitcoin (BTC) transaction data in scientific research. Although Bitcoin's blockchain data is public, due to the uniqueness of its protocol and the complexity of the data, it is not easy to directly extract useful information from the original blockchain data. Specifically, the paper attempts to solve the following key problems: 1. **Lack of standardized and easy - to - analyze data sets**: - Existing Bitcoin transaction data sets are either just sample data or only provide raw data, requiring researchers to perform complex pre - processing steps before they can be used for analysis. - Many existing data sets only cover part of the transactions or are limited to address - to - address transactions and cannot fully reflect the transaction network among users. 2. **Application of temporal graph representation**: - Bitcoin transactions have obvious temporal characteristics, so using the temporal graph representation can better capture these dynamic changes. - Most of the existing data sets do not use the temporal graph representation, limiting the study of time - dependence. 3. **User aggregation and address clustering**: - Bitcoin transactions involve multiple input and output addresses, and these addresses may belong to the same user. How to effectively cluster these addresses into users is a challenge. - The paper solves this problem by introducing the common - input heuristic method and external knowledge bases such as WalletExplorer. 4. **Requirement for large - scale data processing and analysis tools**: - The amount of Bitcoin transaction data is huge, and traditional static graph analysis tools are difficult to handle data of this scale. - It is necessary to develop tools and algorithms suitable for large - scale network analysis to meet the needs of big data analysis. ### Solutions To solve the above problems, the paper proposes the **ORBITAAL** data set, which is a comprehensive Bitcoin transaction data set based on the temporal graph representation. The main features of ORBITAAL include: - **Wide time range**: It covers all Bitcoin transactions from January 2009 to January 2021. - **Multiple temporal graph representations**: It provides stream graph and snapshot representations at different time scales (year, month, day, hour). - **Dual - label of transaction amounts**: All transaction amounts are represented in Bitcoin (BTC) and US dollars (USD) converted by the daily exchange rate. - **Detailed user information**: It provides the life cycle of each user, the list of associated public addresses, and the final Bitcoin balance. - **User aggregation technology**: It simplifies the representation of the transaction network by aggregating multiple addresses into a single user through address clustering technology. Through the ORBITAAL data set, researchers can more conveniently conduct temporal graph analysis of the Bitcoin transaction network and explore topics such as economic relationships and changes in network structure.