Channel Capacity

lean4-proofinformation-theoryvisualization
C=maxPXI(X;Y)=1H(p)C = \max_{P_X} I(X;Y) = 1 - H(p)

Statement

The channel capacity of a discrete memoryless channel is

C=maxPXI(X;Y)C = \max_{P_X} I(X; Y)

where the maximum is over all input distributions PXP_X and I(X;Y)I(X;Y) is the mutual information. For the binary symmetric channel (BSC) with crossover probability p[0,1]p \in [0,1]:

CBSC(p)=1H2(p)=1+plog2p+(1p)log2(1p)C_{\text{BSC}}(p) = 1 - H_2(p) = 1 + p \log_2 p + (1-p) \log_2(1-p)

(measured in bits, with H2H_2 the binary entropy in bits).

Boundary values: C(0)=1C(0) = 1 (perfect channel), C(1/2)=0C(1/2) = 0 (pure noise).

Visualization

BSC capacity as a function of crossover probability pp:

ppH2(p)H_2(p) (bits)C=1H2(p)C = 1 - H_2(p)
0.000.0001.000
0.050.2860.714
0.100.4690.531
0.150.6100.390
0.250.8110.189
0.501.0000.000

ASCII sketch of C(p)C(p):

C
1 *
  |*
  | *
  |  *
  |    *
  |        *
  |               *
0 +-------------------> p
  0      1/4     1/2

The BSC is symmetric about p=1/2p = 1/2, so C(p)=C(1p)C(p) = C(1-p). The maximum input distribution achieving capacity is uniform (PX(0)=PX(1)=1/2P_X(0) = P_X(1) = 1/2).

Proof Sketch

  1. For the BSC, Y=XZY = X \oplus Z where ZBernoulli(p)Z \sim \text{Bernoulli}(p) independent of XX.
  2. I(X;Y)=H(Y)H(YX)=H(Y)H(Z)=H(Y)H2(p)I(X;Y) = H(Y) - H(Y|X) = H(Y) - H(Z) = H(Y) - H_2(p).
  3. H(Y)1H(Y) \le 1 with equality when YY is uniform, achieved by the uniform input.
  4. Therefore C=maxH(Y)H2(p)=1H2(p)C = \max H(Y) - H_2(p) = 1 - H_2(p).
  5. At p=0p = 0: H2(0)=0H_2(0) = 0, so C=1C = 1. At p=1/2p = 1/2: H2(1/2)=1H_2(1/2) = 1, so C=0C = 0.

Connections

Channel capacity is defined via Mutual InformationMutual InformationI(X;Y)=H(X)+H(Y)H(X,Y)0I(X;Y) = H(X) + H(Y) - H(X,Y) \ge 0I(X;Y) = H(X) + H(Y) - H(X,Y) measures how much knowing Y reduces uncertainty about XRead more → and saturated by the Gaussian channel variant of the Cauchy–Schwarz InequalityCauchy–Schwarz Inequalityu,vuv|\langle u, v \rangle| \leq \|u\| \, \|v\|The inner product of two vectors is bounded by the product of their normsRead more →. The noiseless channel (p=0p = 0) achieves the limit established by the Source Coding TheoremSource Coding TheoremH(X)E[]<H(X)+1H(X) \le \mathbb{E}[\ell] < H(X) + 1Shannon's noiseless coding theorem: optimal prefix code lengths satisfy H(X) <= E[l] < H(X) + 1Read more →.

Lean4 Proof

import Mathlib.Analysis.SpecialFunctions.BinaryEntropy

open Real

/-- BSC capacity at crossover p=0: no noise means C=1. -/
theorem bsc_capacity_zero : 1 - binEntropy (0 : ) = 1 := by
  simp [binEntropy_zero]

/-- BSC capacity at crossover p=1/2: maximum noise means C=0. -/
theorem bsc_capacity_half : 1 - binEntropy (2 : )⁻¹ = 1 - log 2 := by
  simp [binEntropy_two_inv]

/-- BSC capacity at p=1 equals BSC capacity at p=0 by symmetry of binEntropy. -/
theorem bsc_capacity_symm (p : ) : binEntropy p = binEntropy (1 - p) :=
  (binEntropy_one_sub p).symm