Source Coding Theorem

lean4-proofinformation-theoryvisualization
H(X)E[]<H(X)+1H(X) \le \mathbb{E}[\ell] < H(X) + 1

Statement

Let XX be a discrete random variable with distribution (p1,,pn)(p_1, \ldots, p_n) and Shannon entropy H(X)=pilog2piH(X) = -\sum p_i \log_2 p_i (bits). For any prefix-free binary code with codeword lengths i\ell_i, the expected length satisfies

H(X)E[]=ipii.H(X) \le \mathbb{E}[\ell] = \sum_{i} p_i \ell_i.

Moreover, the Huffman code (optimal prefix-free code) achieves

E[]<H(X)+1.\mathbb{E}[\ell^*] < H(X) + 1.

Together: no prefix code can do better than H(X)H(X) bits per symbol on average, and Huffman coding comes within 1 bit.

Visualization

Source with four symbols and distribution P=(0.4,0.3,0.2,0.1)P = (0.4, 0.3, 0.2, 0.1):

Symbolpip_iHuffman codei\ell_ipiip_i \ell_ipilog2pi-p_i \log_2 p_i
aa0.40010.400.529
bb0.301020.600.521
cc0.2011030.600.464
dd0.1011130.300.332

H(X)=0.529+0.521+0.464+0.332=1.846H(X) = 0.529 + 0.521 + 0.464 + 0.332 = \mathbf{1.846} bits.

E[]=0.40+0.60+0.60+0.30=1.90\mathbb{E}[\ell] = 0.40 + 0.60 + 0.60 + 0.30 = \mathbf{1.90} bits.

Check: 1.8461.90<1.846+1=2.8461.846 \le 1.90 < 1.846 + 1 = 2.846. Both inequalities hold.

Huffman tree construction (merge two smallest at each step):

Step 1: merge c(0.20) + d(0.10) -> cd(0.30)
Step 2: merge b(0.30) + cd(0.30) -> bcd(0.60)
Step 3: merge a(0.40) + bcd(0.60) -> root(1.00)

root
 |-- 0 : a
 |-- 1
      |-- 10 : b
      |-- 11
           |-- 110 : c
           |-- 111 : d

Proof Sketch

  1. Lower bound (H(X)E[]H(X) \le \mathbb{E}[\ell]): Set qi=2i/Zq_i = 2^{-\ell_i} / Z where Z=2i1Z = \sum 2^{-\ell_i} \le 1. The Gibbs inequality (non-negativity of KL divergence) gives pilog(pi/qi)0\sum p_i \log(p_i/q_i) \ge 0, which expands to piipilog2pi+log2ZH(X)\sum p_i \ell_i \ge -\sum p_i \log_2 p_i + \log_2 Z \ge H(X).
  2. Upper bound (E[]<H(X)+1\mathbb{E}[\ell^*] < H(X) + 1): Choose i=log2pi\ell_i = \lceil -\log_2 p_i \rceil. These satisfy Kraft's inequality, so a prefix-free code exists. Then pii<pi(log2pi+1)p_i \ell_i < p_i(-\log_2 p_i + 1); summing gives E[]<H(X)+1\mathbb{E}[\ell^*] < H(X) + 1.

Connections

The lower bound relies on the positivity of KL divergence, a consequence of Shannon EntropyShannon EntropyH(X)=ipilogpi0H(X) = -\sum_{i} p_i \log p_i \ge 0The unique measure of uncertainty for a probability distribution: H(X) = -sum p_i log p_i, always non-negativeRead more →'s concavity. The Kraft feasibility condition is Kraft's InequalityKraft's Inequalityi=1n2i1\sum_{i=1}^{n} 2^{-\ell_i} \le 1Every prefix-free binary code with codeword lengths l_1,...,l_n satisfies the sum 2^{-l_i} <= 1Read more →. The bound has the same form as the AM–GM InequalityAM–GM Inequalitya1++anna1ann\frac{a_1+\cdots+a_n}{n} \geq \sqrt[n]{a_1 \cdots a_n}The arithmetic mean is always at least as large as the geometric meanRead more → applied to log terms via Jensen.

Lean4 Proof

/-- Verify both inequalities for the concrete source P=(0.4, 0.3, 0.2, 0.1)
    with Huffman lengths (1, 2, 3, 3) (bits, rational arithmetic). -/
theorem source_coding_concrete :
    let H  :  := 4 * 529 / 1000 + 3 * 521 / 1000 + 2 * 464 / 1000 + 1 * 332 / 1000
    let El :  := 4 * 1 / 10 + 3 * 2 / 10 + 2 * 3 / 10 + 1 * 3 / 10
    -- Expected length 1.90 lies in [H, H+1)
    H  El  El < H + 1 := by
  norm_num