Shannon Entropy
Statement
Let be a discrete random variable taking values in with probability distribution , where and . The Shannon entropy is
(with the convention ). The entropy satisfies:
- for all distributions.
- if and only if one and the rest are zero.
- is maximized at when for all .
For a Bernoulli source the entropy reduces to the binary entropy function
which peaks at .
Visualization
Binary entropy for selected values (natural log; max at ):
| 0.00 | 0.000 | 0.000 | 0.000 |
| 0.10 | 0.230 | 0.095 | 0.325 |
| 0.20 | 0.322 | 0.161 | 0.500 |
| 0.30 | 0.361 | 0.240 | 0.611 |
| 0.40 | 0.366 | 0.306 | 0.673 |
| 0.50 | 0.347 | 0.347 | 0.693 |
The curve is strictly concave and symmetric about .
ASCII sketch of :
h
| *
| * *
| * *
| * *
| * *
+-------------------> p
0 1/2 1
Proof Sketch
- Each term equals , the function .
- Mathlib proves
negMulLog_nonneg: for we have , because on . - A sum of non-negative terms is non-negative, so .
- At or the term vanishes by continuity ( by convention). At the derivative , confirming the maximum.
Connections
The concavity of entropy underlies the proof of Channel CapacityChannel CapacityThe BSC capacity C = 1 - H(p) is the maximum reliable bit rate; C(0)=1 and C(1/2)=0Read more → and drives the argument in the Source Coding TheoremSource Coding TheoremShannon's noiseless coding theorem: optimal prefix code lengths satisfy H(X) <= E[l] < H(X) + 1Read more →. The same log-sum inequality appears in Markov's InequalityMarkov's InequalityA non-negative random variable rarely exceeds a multiple of its expectationRead more → style bounds via Jensen's inequality.
Lean4 Proof
import Mathlib.Analysis.SpecialFunctions.Log.NegMulLog
open Real
/-- Each term -p * log p is non-negative for p in [0,1]. -/
theorem negMulLog_term_nonneg (p : ℝ) (hp0 : 0 ≤ p) (hp1 : p ≤ 1) :
0 ≤ negMulLog p :=
negMulLog_nonneg hp0 hp1
/-- Binary entropy h(p) = -p log p - (1-p) log(1-p) is non-negative on [0,1].
Mathlib's `binEntropy_nonneg` covers this directly. -/
theorem binEntropy_nonneg' (p : ℝ) (hp0 : 0 ≤ p) (hp1 : p ≤ 1) :
0 ≤ binEntropy p :=
binEntropy_nonneg hp0 hp1Referenced by
- Source Coding TheoremInformation Theory
- Fano's InequalityInformation Theory
- Mutual InformationInformation Theory