Merkle Tree

lean4-proofcryptographyvisualization
hroot=H(H(h1h2)H(h3h4))h_{\text{root}} = H(H(h_1 \| h_2) \| H(h_3 \| h_4))

Statement

Given nn data blocks d1,,dnd_1, \ldots, d_n and a collision-resistant hash function HH, a Merkle tree assigns each leaf a hash hi=H(di)h_i = H(d_i) and builds internal nodes by hashing pairs:

hij=H(hihj)h_{ij} = H(h_i \| h_j)

The root hrooth_{\text{root}} commits to the entire dataset: changing any did_i changes hih_i, which propagates up to change the root. To prove membership of d1d_1 one supplies the authentication path (h2,h34)(h_2, h_{34}) and the verifier recomputes:

H(H(d1)h2)=h12,H(h12h34)=?hrootH(H(d_1) \| h_2) = h_{12}, \quad H(h_{12} \| h_{34}) \stackrel{?}{=} h_{\text{root}}

Two hashes suffice for a 4-leaf tree; in general log2n\lceil \log_2 n \rceil hashes suffice.

Visualization

Four-leaf Merkle tree with leaves A,B,C,DA, B, C, D:

            root = H(h_AB || h_CD)
           /                      \
    h_AB = H(h_A || h_B)    h_CD = H(h_C || h_D)
         /        \               /        \
    h_A=H(A)  h_B=H(B)     h_C=H(C)  h_D=H(D)
        A         B             C         D

Proof of membership for leaf AA (path marked with *):

Prover sends: (h_B, h_CD)
Verifier:
  step 1:  h_AB = H(h_A || h_B)   -- recompute using provided h_B
  step 2:  H(h_AB || h_CD) =? root -- check against known root

Toy example with H(x)=xmod97H(x) = x \bmod 97 (for illustration only — not a real hash):

NodeValue
hA=H(10)h_A = H(10)1010
hB=H(20)h_B = H(20)2020
hC=H(30)h_C = H(30)3030
hD=H(40)h_D = H(40)4040
hAB=H(10100+20)h_{AB} = H(10 \cdot 100 + 20)1020mod97=501020 \bmod 97 = 50
hCD=H(30100+40)h_{CD} = H(30 \cdot 100 + 40)3040mod97=803040 \bmod 97 = 80
root =H(50100+80)= H(50 \cdot 100 + 80)5080mod97=735080 \bmod 97 = 73

To verify A=10A = 10: recompute H(10)=10H(10) = 10, H(10100+20)=50H(10 \cdot 100 + 20) = 50, H(50100+80)=73H(50 \cdot 100 + 80) = 73 — matches the root.

Proof Sketch

  1. Collision resistance implies binding. If HH is collision resistant, no adversary can produce d1d_1' with H(d1)=h1H(d_1') = h_1 without finding a preimage, and they cannot produce a fake path without finding an internal collision.

  2. Path verification terminates. Starting from a leaf hash, each step merges two sibling hashes and moves one level up. After log2n\lceil \log_2 n \rceil steps we reach the root.

  3. Soundness. If the reconstructed root matches the committed root, then (under collision resistance) every hash on the path was correct, which means did_i is the authentic leaf.

  4. Succinct proof size. Each authentication path has length log2n\lceil \log_2 n \rceil, which for n=220n = 2^{20} is only 2020 hash values regardless of dataset size.

Connections

Merkle trees are a data-structural companion to Digital Signature (Schnorr)Digital Signature (Schnorr)gs=Rycg^s = R \cdot y^cA Schnorr signature (R, s) is valid iff g^s = R · y^c, verified by a single group equation.Read more → — signatures authenticate public keys while Merkle proofs authenticate data. The collision-resistance assumption is related to One-Way FunctionOne-Way Functionf(x)=gxmodpf(x) = g^x \bmod pThe discrete log f(x) = g^x mod p is easy to evaluate but conjectured hard to invert — a trapdoor that powers public-key cryptography.Read more → (preimage resistance). The tree structure mirrors Binomial TheoremBinomial Theorem(x+y)n=k=0n(nk)xkynk(x+y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}Expanding powers of a sum via binomial coefficientsRead more → coefficient patterns in Pascal's triangle.

Lean4 Proof

/-- We model a toy Merkle tree over ℕ with H(x) = x % 97.
    Verify that the path proof for leaf 0 is correct. -/

def toyHash (x : ) :  := x % 97

def merkleLeaf (x : ) :  := toyHash x

-- Combine two hashes: pack as h1 * 100 + h2 then hash
def merkleNode (l r : ) :  := toyHash (l * 100 + r)

def hA :  := merkleLeaf 10   -- = 10
def hB :  := merkleLeaf 20   -- = 20
def hC :  := merkleLeaf 30   -- = 30
def hD :  := merkleLeaf 40   -- = 40

def hAB :  := merkleNode hA hB  -- = 50
def hCD :  := merkleNode hC hD  -- = 80
def root :  := merkleNode hAB hCD  -- = 73

/-- Path verification for leaf A = 10, given sibling h_B and uncle h_CD. -/
theorem merkle_verify_A :
    merkleNode (merkleNode (merkleLeaf 10) hB) hCD = root := by
  decide