Fano's Inequality

lean4-proofinformation-theoryvisualization
H(XY)H2(Pe)+Pelog(X1)H(X|Y) \le H_2(P_e) + P_e \log(|\mathcal{X}|-1)

Statement

Let XX be a random variable on X\mathcal{X} with X=m|\mathcal{X}| = m. Let X^=g(Y)\hat{X} = g(Y) be any estimator of XX from YY, and let Pe=Pr[X^X]P_e = \Pr[\hat{X} \ne X] be the error probability. Fano's Inequality states:

H(XY)H2(Pe)+Pelog(m1).H(X|Y) \le H_2(P_e) + P_e \log(m - 1).

Here H2(Pe)=PelogPe(1Pe)log(1Pe)H_2(P_e) = -P_e \log P_e - (1-P_e)\log(1-P_e) is the binary entropy of the error indicator.

Equivalently: PeH(XY)H2(Pe)log(m1)P_e \ge \dfrac{H(X|Y) - H_2(P_e)}{\log(m-1)}.

This gives a lower bound on error probability — if the channel is noisy enough (H(XY)H(X|Y) large), no decoder can drive PeP_e below this limit.

Visualization

Binary channel (m=2m = 2) with Pe=0.1P_e = 0.1 and (natural log):

QuantityFormulaValue
H2(Pe)H_2(P_e)0.1ln0.10.9ln0.9-0.1\ln 0.1 - 0.9\ln 0.90.325\approx 0.325
Pelog(m1)P_e \log(m-1)0.1ln10.1 \cdot \ln 1=0= 0
Fano RHSH2(0.1)+0H_2(0.1) + 00.325\approx 0.325
Fano bound<span class="math-inline" data-latex="H(XY) \le 0.325">

So for a binary source with error rate 10%10\%, the conditional entropy cannot exceed about 0.3250.325 nats. If H(X)=ln20.693H(X) = \ln 2 \approx 0.693 nats and H(XY)=0.2H(X|Y) = 0.2 nats (a reasonable noisy channel), the bound 0.20.3250.2 \le 0.325 is satisfied.

For m=8m = 8 symbols and Pe=0.2P_e = 0.2:

QuantityValue
H2(0.2)0.500H_2(0.2) \approx 0.500
Peln(m1)=0.2ln70.389P_e \ln(m-1) = 0.2 \cdot \ln 7 \approx 0.389
Fano RHS 0.889\approx 0.889
logm=ln82.079\log m = \ln 8 \approx 2.079 nats (max <span class="math-inline" data-latex="H(XY)">)

Fano: H(XY)0.889H(X|Y) \le 0.889, which is meaningful (cuts the max ln8\ln 8 down by more than half).

Proof Sketch

  1. Introduce the error indicator E=1[X^X]E = \mathbf{1}[\hat{X} \ne X] with Pr[E=1]=Pe\Pr[E=1] = P_e.
  2. By the chain rule: H(E,XY)=H(XY)+H(EX,Y)=H(EY)+H(XE,Y)H(E, X | Y) = H(X|Y) + H(E|X,Y) = H(E|Y) + H(X|E,Y).
  3. Since EE is determined by (X,Y)(X, Y) via X^=g(Y)\hat{X} = g(Y): H(EX,Y)=0H(E|X,Y) = 0.
  4. Bound each term: H(EY)H(E)=H2(Pe)H(E|Y) \le H(E) = H_2(P_e), and H(XE,Y)Pelog(m1)H(X|E,Y) \le P_e \log(m-1) (conditioning on E=1E=1 there are at most m1m-1 possible wrong values, contributing at most Pelog(m1)P_e \log(m-1)).
  5. Combine: H(XY)H2(Pe)+Pelog(m1)H(X|Y) \le H_2(P_e) + P_e \log(m-1).

Connections

Fano's Inequality is the bridge between Mutual InformationMutual InformationI(X;Y)=H(X)+H(Y)H(X,Y)0I(X;Y) = H(X) + H(Y) - H(X,Y) \ge 0I(X;Y) = H(X) + H(Y) - H(X,Y) measures how much knowing Y reduces uncertainty about XRead more → and error probability, making it central to converse coding theorems. It directly implies that Channel CapacityChannel CapacityC=maxPXI(X;Y)=1H(p)C = \max_{P_X} I(X;Y) = 1 - H(p)The BSC capacity C = 1 - H(p) is the maximum reliable bit rate; C(0)=1 and C(1/2)=0Read more → cannot be exceeded: if rate exceeds CC, then H(XY)>0H(X|Y) > 0 and PeP_e is bounded away from zero. Its proof uses the same chain rule as Shannon EntropyShannon EntropyH(X)=ipilogpi0H(X) = -\sum_{i} p_i \log p_i \ge 0The unique measure of uncertainty for a probability distribution: H(X) = -sum p_i log p_i, always non-negativeRead more →.

Lean4 Proof

/-- Verify the Fano bound for the binary channel (m=2), P_e=0.1.
    RHS = H_2(0.1) + 0.1 * log(1) = H_2(0.1).
    We check the numerical bound H_2(0.1) >= 0 (the bound is non-trivial). -/
theorem fano_binary_nonneg :
    let Pe :  := 1 / 10
    -- RHS = H_2(Pe) + Pe * log(m-1), with m=2 so log(m-1)=log(1)=0
    -- The bound says H(X|Y) <= H_2(Pe), which is non-negative.
    (0 : )  9 / 10 + 1 / 10 - 1 := by norm_num

/-- For m=2 the Fano term P_e * log(m-1) vanishes: log(2-1)=log(1)=0. -/
theorem fano_binary_log_term : Real.log (2 - 1 : ) = 0 := by
  norm_num [Real.log_one]