Source Coding Theorem
Statement
Let be a discrete random variable with distribution and Shannon entropy (bits). For any prefix-free binary code with codeword lengths , the expected length satisfies
Moreover, the Huffman code (optimal prefix-free code) achieves
Together: no prefix code can do better than bits per symbol on average, and Huffman coding comes within 1 bit.
Visualization
Source with four symbols and distribution :
| Symbol | Huffman code | ||||
|---|---|---|---|---|---|
| 0.40 | 0 | 1 | 0.40 | 0.529 | |
| 0.30 | 10 | 2 | 0.60 | 0.521 | |
| 0.20 | 110 | 3 | 0.60 | 0.464 | |
| 0.10 | 111 | 3 | 0.30 | 0.332 |
bits.
bits.
Check: . Both inequalities hold.
Huffman tree construction (merge two smallest at each step):
Step 1: merge c(0.20) + d(0.10) -> cd(0.30)
Step 2: merge b(0.30) + cd(0.30) -> bcd(0.60)
Step 3: merge a(0.40) + bcd(0.60) -> root(1.00)
root
|-- 0 : a
|-- 1
|-- 10 : b
|-- 11
|-- 110 : c
|-- 111 : d
Proof Sketch
- Lower bound (): Set where . The Gibbs inequality (non-negativity of KL divergence) gives , which expands to .
- Upper bound (): Choose . These satisfy Kraft's inequality, so a prefix-free code exists. Then ; summing gives .
Connections
The lower bound relies on the positivity of KL divergence, a consequence of Shannon EntropyShannon EntropyThe unique measure of uncertainty for a probability distribution: H(X) = -sum p_i log p_i, always non-negativeRead more →'s concavity. The Kraft feasibility condition is Kraft's InequalityKraft's InequalityEvery prefix-free binary code with codeword lengths l_1,...,l_n satisfies the sum 2^{-l_i} <= 1Read more →. The bound has the same form as the AM–GM InequalityAM–GM InequalityThe arithmetic mean is always at least as large as the geometric meanRead more → applied to log terms via Jensen.
Lean4 Proof
/-- Verify both inequalities for the concrete source P=(0.4, 0.3, 0.2, 0.1)
with Huffman lengths (1, 2, 3, 3) (bits, rational arithmetic). -/
theorem source_coding_concrete :
let H : ℚ := 4 * 529 / 1000 + 3 * 521 / 1000 + 2 * 464 / 1000 + 1 * 332 / 1000
let El : ℚ := 4 * 1 / 10 + 3 * 2 / 10 + 2 * 3 / 10 + 1 * 3 / 10
-- Expected length 1.90 lies in [H, H+1)
H ≤ El ∧ El < H + 1 := by
norm_numReferenced by
- Channel CapacityInformation Theory
- Kraft's InequalityInformation Theory
- Shannon EntropyInformation Theory