Mutual Information
Statement
The mutual information between discrete random variables and is
Key properties:
- , with equality iff and are independent.
- .
- (a variable is maximally informative about itself).
- .
Visualization
Three joint distributions over :
Case 1: Independent ()
| 0.25 | 0.25 | |
| 0.25 | 0.25 |
; ; .
Case 2: ()
| 0.50 | 0.00 | |
| 0.00 | 0.50 |
; ; .
Case 3: Partial dependence ()
| 0.40 | 0.10 | |
| 0.10 | 0.40 |
; ; .
All three confirm .
Proof Sketch
- Write , the Kullback-Leibler divergence.
- By the log-sum inequality (a consequence of Jensen's inequality applied to the convex function ), for any distributions .
- Equality holds iff for all , i.e., .
- The formula follows: .
Connections
Mutual information is the fundamental quantity whose maximization defines Channel CapacityChannel CapacityThe BSC capacity C = 1 - H(p) is the maximum reliable bit rate; C(0)=1 and C(1/2)=0Read more →. Its non-negativity follows from the same convexity argument underlying Shannon EntropyShannon EntropyThe unique measure of uncertainty for a probability distribution: H(X) = -sum p_i log p_i, always non-negativeRead more →'s concavity, which also appears in Chebyshev's InequalityChebyshev's InequalityA random variable rarely deviates from its mean by more than a few standard deviationsRead more → style arguments.
Lean4 Proof
import Mathlib.Analysis.SpecialFunctions.Log.NegMulLog
open Real
/-- Shannon entropy of a 2-point distribution with probabilities p and (1-p).
H(p) = -p * log p - (1-p) * log (1-p). -/
noncomputable def binaryEntropy (p : ℝ) : ℝ :=
-(p * Real.log p) - ((1 - p) * Real.log (1 - p))
/-- Mutual information for a joint distribution on a finite type,
defined as I(X;Y) = H(X) + H(Y) - H(X,Y). -/
noncomputable def mutualInfo {α β : Type*} [Fintype α] [Fintype β]
(joint : α → β → ℝ) : ℝ :=
let margX : α → ℝ := fun a => ∑ b, joint a b
let margY : β → ℝ := fun b => ∑ a, joint a b
let hX := -∑ a, margX a * Real.log (margX a)
let hY := -∑ b, margY b * Real.log (margY b)
let hXY := -∑ a, ∑ b, joint a b * Real.log (joint a b)
hX + hY - hXY
/-- I(X;X) = H(X): a variable is maximally informative about itself.
For the diagonal joint (p(a,b) = p_a when a=b, else 0):
H(X,X) = H(X), so I = H(X) + H(X) - H(X) = H(X). -/
theorem mutual_info_self_eq_entropy {α : Type*} [DecidableEq α] [Fintype α]
(pmf : α → ℝ) :
let diagJoint : α → α → ℝ := fun a b => if a = b then pmf a else 0
mutualInfo diagJoint =
-∑ a, pmf a * Real.log (pmf a) := by
simp [mutualInfo, Finset.sum_ite_eq', Finset.mem_univ]
ringReferenced by
- Channel CapacityInformation Theory
- Fano's InequalityInformation Theory