Evaluating the Security of Merkle Trees in the Internet of Things: An Analysis of Data Falsification Probabilities

Oleksandr Kuznetsov,Alex Rusnak,Anton Yezhov,Kateryna Kuznetsova,Dzianis Kanonik,Oleksandr Domin
2024-04-18
Abstract:Addressing the critical challenge of ensuring data integrity in decentralized systems, this paper delves into the underexplored area of data falsification probabilities within Merkle Trees, which are pivotal in blockchain and Internet of Things (IoT) technologies. Despite their widespread use, a comprehensive understanding of the probabilistic aspects of data security in these structures remains a gap in current research. Our study aims to bridge this gap by developing a theoretical framework to calculate the probability of data falsification, taking into account various scenarios based on the length of the Merkle path and hash length. The research progresses from the derivation of an exact formula for falsification probability to an approximation suitable for cases with significantly large hash lengths. Empirical experiments validate the theoretical models, exploring simulations with diverse hash lengths and Merkle path lengths. The findings reveal a decrease in falsification probability with increasing hash length and an inverse relationship with longer Merkle paths. A numerical analysis quantifies the discrepancy between exact and approximate probabilities, underscoring the conditions for the effective application of the approximation. This work offers crucial insights into optimizing Merkle Tree structures for bolstering security in blockchain and IoT systems, achieving a balance between computational efficiency and data integrity.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the probability of data forgery in Merkle trees in the Internet of Things (IoT) to ensure data integrity and authenticity. Specifically, the research aims to develop a theoretical framework to calculate the probability of data forgery and consider the influence of Merkle path length and hash length on this probability in different scenarios. ### Research Background With the popularization of IoT devices, the security and integrity of data have become crucial. As an efficient method for verifying the integrity of large - capacity data structures, Merkle trees have been widely used in blockchain and IoT technologies. However, there are relatively few studies on the probability of data forgery in Merkle trees in these application scenarios. Therefore, this research fills this gap and provides a theoretical basis for optimizing the Merkle tree structure, thereby enhancing the security of blockchain and IoT systems. ### Main Problem Description The core problem of the paper can be formalized as follows: Given a Merkle tree, when a data block \(D_i\) is replaced with \(D'_i\) while other data blocks remain unchanged, find the probability \(P(R = R')\) that the root node \(R\) remains unchanged. The specific formula is as follows: \[P_{\text{falsification}}=P(R = R')\] where, \[N^{(m)}_{\text{par}} = H(N^{(m - 1)}_{\text{par}}, N^{(m - 1)}_{\text{sib}})\] \[N'^{(m)}_{\text{par}} = H(N'^{(m - 1)}_{\text{par}}, N'^{(m - 1)}_{\text{sib}})\] Assume that the hash function \(H\) behaves like a random oracle, then the probability that two different inputs produce the same output is: \[P(H(D_i)=H(D'_i))=\frac{1}{2^b}\] ### Research Method In order to accurately calculate the probability of data forgery, the paper derives the following formula: For any positive integer \(m\), the probability of data forgery when the Merkle path length is \(m\) is: \[P_{\text{falsification}}=\sum_{k = 0}^{m}\left(\frac{1}{2^b}\right)^{k + 1}\left(1-\frac{1}{2^b}\right)^{m - k}\] After simplification, we get: \[P_{\text{falsification}}=1-\left(1-\frac{1}{2^b}\right)^{m + 1}\] For a relatively large hash length \(b\), an approximate formula can be used: \[P_{\text{falsification}}\approx1 - e^{-\frac{m + 1}{2^b}}\] ### Experimental Verification Experimental verification was carried out through a Python program, and the results show that: 1. As the hash length \(b\) increases, the probability of data forgery decreases significantly. 2. As the Merkle path length \(m\) increases, the probability of data forgery increases somewhat. These findings are helpful for optimizing the design of Merkle trees in practical applications and balancing security and computational efficiency. ### Conclusion This research not only provides theoretical support but also proves its effectiveness through experiments. This is of great significance for applications in ensuring data integrity and authenticity in blockchain and IoT systems.