Chernoff Bound

lean4-proofprobabilityvisualization
P(Xε)etεMX(t)t0P(X \ge \varepsilon) \le e^{-t\varepsilon} M_X(t) \quad \forall t \ge 0

Statement

Let XX be a real-valued random variable on a probability space (Ω,F,P)(\Omega, \mathcal{F}, P), and let MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}] be its moment generating function. For any εR\varepsilon \in \mathbb{R} and t0t \ge 0 such that MX(t)<M_X(t) < \infty:

P(Xε)etεMX(t).P(X \ge \varepsilon) \le e^{-t\varepsilon} M_X(t).

Optimising over tt gives the Chernoff bound:

P(Xε)inft0etεMX(t)=eΛ(ε)P(X \ge \varepsilon) \le \inf_{t \ge 0} e^{-t\varepsilon} M_X(t) = e^{-\Lambda^*(\varepsilon)}

where Λ(ε)=supt0(tεlogMX(t))\Lambda^*(\varepsilon) = \sup_{t \ge 0} (t\varepsilon - \log M_X(t)) is the Legendre–Fenchel transform of logMX\log M_X.

Visualization

XBin(100,1/2)X \sim \text{Bin}(100, 1/2). Bound on P(X70)P(X \ge 70) via Chernoff vs Chebyshev vs true value:

P(Bin(100, 1/2) ≥ 70):

Method             Bound
─────────────────────────────────────────
True value         ≈ 2.8 × 10⁻⁷
Markov (t=0.7)     7/10 = 0.700          (useless)
Chebyshev          1/(4·(70-50)²/100)
                 = 0.00625               (weak)
Chernoff           e^{-2·100·(0.2)²}
                 = e^{-8} ≈ 3.4×10⁻⁴   (sharp)

Derivation of Chernoff for Bin(n,p):
  M_X(t) = (1-p+pe^t)^n = ((1+e^t)/2)^100  for p=1/2
  Optimise: t* = log(ε(1-p)/(p(1-ε))) ≈ log(70·50/(50·30)) ≈ 0.847
  Result: P(X≥70) ≤ exp(-100·D_KL(0.7 || 0.5)) ≈ e^{-8} ≈ 3.35×10⁻⁴

Where D_KL(0.7||0.5) = 0.7·log(1.4) + 0.3·log(0.6) ≈ 0.0855
So 100 · 0.0855 = 8.55,  bound = e^{-8.55} ≈ 1.9×10⁻⁴  (even tighter)

Simpler Hoeffding form for X=i=1nXiX = \sum_{i=1}^n X_i with Xi[0,1]X_i \in [0,1]:

P ⁣(Xnp+δ)e2nδ2.P\!\left(\frac{X}{n} \ge p + \delta\right) \le e^{-2n\delta^2}.

For n=100n=100, p=1/2p=1/2, δ=0.2\delta=0.2: bound =e21000.04=e83.35×104= e^{-2\cdot 100 \cdot 0.04} = e^{-8} \approx 3.35 \times 10^{-4}.

Proof Sketch

  1. Markov's inequality on etXe^{tX}. Since etX0e^{tX} \ge 0: P(Xε)=P(etXetε)E[etX]etε=etεMX(t)P(X \ge \varepsilon) = P(e^{tX} \ge e^{t\varepsilon}) \le \dfrac{\mathbb{E}[e^{tX}]}{e^{t\varepsilon}} = e^{-t\varepsilon} M_X(t).
  2. Optimise over t0t \ge 0. Taking the infimum over all valid tt gives the tightest bound.
  3. Log-convexity. logMX\log M_X is convex, so the optimisation has a unique minimiser tt^* satisfying MX(t)/MX(t)=εM_X'(t^*)/M_X(t^*) = \varepsilon (if ε>E[X]\varepsilon > \mathbb{E}[X]).

Connections

The Chernoff bound is a vast strengthening of Markov's InequalityMarkov's InequalityP(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}A non-negative random variable rarely exceeds a multiple of its expectationRead more → (which gives a polynomial tail bound) and Chebyshev's InequalityChebyshev's InequalityP(Xμkσ)1k2P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}A random variable rarely deviates from its mean by more than a few standard deviationsRead more → (which gives a 1/δ21/\delta^2 bound). The optimised form relates to the large deviations rate function and is dual to the Characteristic FunctionCharacteristic FunctionϕX(t)=E[eitX]=eitxdFX(x)\phi_X(t) = \mathbb{E}[e^{itX}] = \int e^{itx}\,dF_X(x)The characteristic function φ_X(t) = E[exp(itX)] uniquely determines the distribution of a random variableRead more → via the Legendre transform.

Lean4 Proof

import Mathlib.Probability.Moments.Basic

open ProbabilityTheory MeasureTheory Real

/-- Chernoff bound: the upper tail of X is bounded by e^{-tε} M_X(t).
    Mathlib: `ProbabilityTheory.measure_ge_le_exp_mul_mgf`. -/
theorem chernoff_bound
    {Ω : Type*} {m : MeasurableSpace Ω}
    {μ : Measure Ω} [IsFiniteMeasure μ]
    {X : Ω  }
    (ε t : ) (ht : 0  t)
    (h_int : Integrable (fun ω => exp (t * X ω)) μ) :
    μ.real {ω | ε  X ω}  exp (-t * ε) * mgf X μ t :=
  measure_ge_le_exp_mul_mgf ε ht h_int