A Formula That Generates Hash Collisions

Andrew Brockmann
DOI: https://doi.org/10.48550/arXiv.1808.10668
2018-08-31
Abstract:We present an explicit formula that produces hash collisions for the Merkle-Damgård construction. The formula works for arbitrary choice of message block and irrespective of the standardized constants used in hash functions, although some padding schemes may cause the formula to fail. This formula bears no obvious practical implications because at least one of any pair of colliding messages will have length double exponential in the security parameter. However, due to ambiguity in existing definitions of collision resistance, this formula arguably breaks the collision resistance of some hash functions.
Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Does there exist an explicit formula that can generate hash function collisions in the Merkle - Damgård construction?** ### Specific problem description: 1. **Collision resistance of hash functions**: - Collision resistance is a fundamental property of hash functions, that is, it is very difficult to find two different messages with the same hash value. Since hash values are often used as digital fingerprints, it is very important to ensure the collision resistance of hash functions. 2. **Ambiguity in existing definitions**: - The paper points out that there is a certain ambiguity in the existing definitions of collision resistance. These definitions do not clearly specify the length of the colliding messages or whether the complete message content needs to be output. This ambiguity may lead to some hash functions being considered as not satisfying collision resistance in theory. 3. **Methods of generating collisions**: - The paper proposes an explicit formula that can generate partial or complete collisions in the Merkle - Damgård constructed hash functions. This formula is applicable to arbitrarily selected message blocks and is independent of the standardized constants used in the hash function, although some padding schemes may cause the formula to fail. 4. **Limitations in practical applications**: - The length of the colliding messages generated by this formula is double - exponential, far exceeding the limits in practical applications (for example, the maximum input length of SHA1 and SHA256 is \(2^{64}\) bits). Therefore, this method has no obvious practical significance in practice. ### Main contributions: - **Theoretical breakthrough**: - A formula that can generate hash collisions is proposed, although these colliding messages are very long and cannot be used in practice. - **Re - examination of collision resistance**: - It has triggered a re - thinking of the existing definitions of collision resistance, especially in the case of allowing compressed output, this formula may be considered to break the collision resistance of some hash functions. ### Conclusion: - Although this formula can generate hash collisions theoretically, it does not pose an actual security threat. To prevent this theoretical attack, the definition of collision resistance can be strengthened by limiting the input message length or requiring the output of the complete message. ### Formula presentation: According to the paper, the formula for generating collisions is: \[ M_0 = [B]^{2^\ell + c \cdot (2^\ell)!} \] \[ M_1 = [B]^{2^\ell + c' \cdot (2^\ell)!} \] where \( B \) is a fixed message block, \( \ell \) is the size of the internal state, and \( c \) and \( c' \) are arbitrary natural numbers. The length of the colliding messages generated by this formula is double - exponential, specifically: \[ \text{Message length} \approx b \cdot c \cdot 2^{2^\ell} \] where \( b \) is the length of the message block.