Abstract:We present an explicit formula that produces hash collisions for the Merkle-Damgård construction. The formula works for arbitrary choice of message block and irrespective of the standardized constants used in hash functions, although some padding schemes may cause the formula to fail. This formula bears no obvious practical implications because at least one of any pair of colliding messages will have length double exponential in the security parameter. However, due to ambiguity in existing definitions of collision resistance, this formula arguably breaks the collision resistance of some hash functions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Does there exist an explicit formula that can generate hash function collisions in the Merkle - Damgård construction?** ### Specific problem description: 1. **Collision resistance of hash functions**: - Collision resistance is a fundamental property of hash functions, that is, it is very difficult to find two different messages with the same hash value. Since hash values are often used as digital fingerprints, it is very important to ensure the collision resistance of hash functions. 2. **Ambiguity in existing definitions**: - The paper points out that there is a certain ambiguity in the existing definitions of collision resistance. These definitions do not clearly specify the length of the colliding messages or whether the complete message content needs to be output. This ambiguity may lead to some hash functions being considered as not satisfying collision resistance in theory. 3. **Methods of generating collisions**: - The paper proposes an explicit formula that can generate partial or complete collisions in the Merkle - Damgård constructed hash functions. This formula is applicable to arbitrarily selected message blocks and is independent of the standardized constants used in the hash function, although some padding schemes may cause the formula to fail. 4. **Limitations in practical applications**: - The length of the colliding messages generated by this formula is double - exponential, far exceeding the limits in practical applications (for example, the maximum input length of SHA1 and SHA256 is \(2^{64}\) bits). Therefore, this method has no obvious practical significance in practice. ### Main contributions: - **Theoretical breakthrough**: - A formula that can generate hash collisions is proposed, although these colliding messages are very long and cannot be used in practice. - **Re - examination of collision resistance**: - It has triggered a re - thinking of the existing definitions of collision resistance, especially in the case of allowing compressed output, this formula may be considered to break the collision resistance of some hash functions. ### Conclusion: - Although this formula can generate hash collisions theoretically, it does not pose an actual security threat. To prevent this theoretical attack, the definition of collision resistance can be strengthened by limiting the input message length or requiring the output of the complete message. ### Formula presentation: According to the paper, the formula for generating collisions is: \[ M_0 = [B]^{2^\ell + c \cdot (2^\ell)!} \] \[ M_1 = [B]^{2^\ell + c' \cdot (2^\ell)!} \] where \( B \) is a fixed message block, \( \ell \) is the size of the internal state, and \( c \) and \( c' \) are arbitrary natural numbers. The length of the colliding messages generated by this formula is double - exponential, specifically: \[ \text{Message length} \approx b \cdot c \cdot 2^{2^\ell} \] where \( b \) is the length of the message block.

A Formula That Generates Hash Collisions

Methods for Collisions in Some Algebraic Hash Functions

Time-Space Lower Bounds for Finding Collisions in Merkle–Damgård Hash Functions

Collision Resistance from Multi-collision Resistance

Generic Attacks on Hash Combiners

Quantum Collision Resistance of Double-Block-Length Hashing

A Secure Hash Function MD-192 With Modified Message Expansion

Construction and security analysis of hash algorithm based on stream cipher

$\varepsilon$-Almost collision-flat universal hash functions and mosaics of designs

The Sum Can Be Weaker Than Each Part.

An efficient multi-use multi-secret sharing scheme based on hash function

Security of iterated hash functions based on block ciphers

A New Non-MDS Hash Function Resisting Birthday Attack and Meet-in-the-middle Attack

Triangulating Rebound Attack on AES-like Hashing

Distributional Collision Resistance Beyond One-Way Functions

An Attack on Hash Function HAVAL-128

Attacks on a Double Length Blockcipher-Based Hash Proposal

Collision Resistant Hashing From Sub-Exponential Learning Parity With Noise

Attacks on Fast Double Block Length Hash Functions

Improved Indifferentiability Security Bound for the Prefix-Free Merkle-Damgård Hash Function

Hash Functions Based on Block Ciphers