Weak Convergence (Distribution)

lean4-proofprobabilityvisualization
μnwμ    fdμnfdμ for all fCb\mu_n \xrightarrow{w} \mu \iff \int f\,d\mu_n \to \int f\,d\mu \text{ for all } f \in C_b

Statement

Let (Ω,d)(\Omega, d) be a metric space and μn,μ\mu_n, \mu probability measures on the Borel σ\sigma-algebra. The sequence (μn)(\mu_n) converges weakly (in distribution) to μ\mu, written μnwμ\mu_n \xrightarrow{w} \mu, if for every bounded continuous function f:ΩRf : \Omega \to \mathbb{R}:

fdμn    fdμas n.\int f\,d\mu_n \;\to\; \int f\,d\mu \quad \text{as } n \to \infty.

Equivalent characterisations (Portmanteau theorem):

  1. lim supnμn(F)μ(F)\limsup_n \mu_n(F) \le \mu(F) for all closed FF.
  2. lim infnμn(G)μ(G)\liminf_n \mu_n(G) \ge \mu(G) for all open GG.
  3. μn(A)μ(A)\mu_n(A) \to \mu(A) for all AA with μ(A)=0\mu(\partial A) = 0.
  4. Fn(x)F(x)F_n(x) \to F(x) at all continuity points xx of FF (when Ω=R\Omega = \mathbb{R}).

Visualization

Empirical CDFs of Bn=Bin(n,p)/nB_n = \text{Bin}(n, p)/n converging weakly to δp\delta_p (point mass at pp) for p=0.5p = 0.5:

CDF of Bin(n, 1/2)/n  (normalized)

n=5:
1 |              ╔══════════
  |        ╔════╝
0 |════════╝
  +---0---0.5---1

n=20:
1 |          ╔══════════════
  |       ╔══╝
0 |═══════╝
  +---0---0.5---1

n=100:
1 |        ╔═══════════════
  |      ══╝
0 |══════
  +---0---0.5---1

n → ∞: CDF approaches Heaviside step at x = 0.5:
1 |       ╔══════════════
  |       ║
0 |═══════╝
  +---0---0.5---1  (Dirac δ_{0.5})

The empirical means Xˉnp\bar X_n \to p a.s. by the Strong Law of Large NumbersStrong Law of Large Numbers1ni=1nXia.s.E[X1]\frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{\text{a.s.}} \mathbb{E}[X_1]The sample mean of i.i.d. integrable random variables converges almost surely to the population meanRead more →, and the distributions Bin(n,p)/n\text{Bin}(n,p)/n converge weakly to δp\delta_p.

Proof Sketch

  1. Topology on probability measures. The weak topology on P(Ω)\mathcal{P}(\Omega) is the coarsest making μfdμ\mu \mapsto \int f\,d\mu continuous for all fCb(Ω)f \in C_b(\Omega).
  2. Metrization. On Polish spaces, weak convergence is metrizable (e.g., Prokhorov metric or bounded Lipschitz distance).
  3. Characterisation. Equivalence of the four Portmanteau conditions follows by approximating indicator functions of open/closed sets by continuous functions (Urysohn's lemma) and monotone convergence.
  4. Tightness and Prokhorov. A family {μn}\{\mu_n\} is tight iff every subsequence has a weakly convergent sub-subsequence (Prokhorov's theorem on Polish spaces).

Connections

The Central Limit TheoremCentral Limit Theoremi=1nXinμσndN(0,1)\frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)Standardized sums of i.i.d. finite-variance random variables converge in distribution to the standard normalRead more → is precisely a weak convergence statement: the distributions of ZnZ_n converge weakly to N(0,1)\mathcal{N}(0,1). The Characteristic FunctionCharacteristic FunctionϕX(t)=E[eitX]=eitxdFX(x)\phi_X(t) = \mathbb{E}[e^{itX}] = \int e^{itx}\,dF_X(x)The characteristic function φ_X(t) = E[exp(itX)] uniquely determines the distribution of a random variableRead more → is continuous with respect to weak convergence (Lévy continuity theorem), making it the main tool for proving weak convergence. The underlying topology on probability measures is studied via Heine–Borel TheoremHeine–Borel TheoremKRn  compact    K  closed and boundedK \subseteq \mathbb{R}^n \;\text{compact} \iff K \;\text{closed and bounded}In ℝⁿ, a subset is compact if and only if it is closed and boundedRead more →-type compactness arguments (Prokhorov's theorem).

Lean4 Proof

import Mathlib.MeasureTheory.Measure.ProbabilityMeasure

open MeasureTheory BoundedContinuousFunction Filter

/-- Weak convergence of probability measures: convergence in the weak topology
    is equivalent to convergence of integrals of bounded continuous functions.
    Mathlib: `MeasureTheory.ProbabilityMeasure.tendsto_iff_forall_integral_tendsto`. -/
theorem weak_convergence_iff
    {Ω : Type*} [TopologicalSpace Ω] [MeasurableSpace Ω] [BorelSpace Ω]
    {γ : Type*} {F : Filter γ}
    {μs : γ  ProbabilityMeasure Ω} {μ : ProbabilityMeasure Ω} :
    Tendsto μs F (𝓝 μ) 
       f : Ω , Tendsto
        (fun i => ∫ ω, f ω ∂(μs i : Measure Ω))
        F
        (𝓝 (∫ ω, f ω ∂(μ : Measure Ω))) :=
  ProbabilityMeasure.tendsto_iff_forall_integral_tendsto