Jensen's Inequality (Convex)

lean4-proofoptimizationvisualization
f ⁣(iwixi)iwif(xi),wi0,  wi=1f\!\left(\sum_i w_i x_i\right) \le \sum_i w_i f(x_i),\quad w_i \ge 0,\; \sum w_i = 1

Statement

Let f:RRf : \mathbb{R} \to \mathbb{R} be convex. For points x1,,xnx_1, \ldots, x_n and non-negative weights wiw_i summing to 1:

f ⁣(i=1nwixi)i=1nwif(xi).f\!\left(\sum_{i=1}^n w_i x_i\right) \le \sum_{i=1}^n w_i f(x_i).

Mathlib states this as ConvexOn.map_sum_le (in Mathlib.Analysis.Convex.Jensen): if hf : ConvexOn 𝕜 s f, weights are non-negative summing to 1, and points lie in ss, then the Jensen inequality holds.

Visualization

Take f(x)=x2f(x) = x^2 (convex) and the distribution {x1,x2,x3}={1,2,3}\{x_1, x_2, x_3\} = \{1, 2, 3\} with equal weights wi=1/3w_i = 1/3.

Weighted average: xˉ=1+2+33=2\bar{x} = \frac{1+2+3}{3} = 2

Jensen LHS: f(xˉ)=f(2)=4f(\bar{x}) = f(2) = 4

Jensen RHS: 13f(1)+13f(2)+13f(3)=1+4+93=1434.67\frac{1}{3}f(1) + \frac{1}{3}f(2) + \frac{1}{3}f(3) = \frac{1+4+9}{3} = \frac{14}{3} \approx 4.67

xix_iwiw_if(xi)=xi2f(x_i) = x_i^2wif(xi)w_i f(x_i)
11/311/3
21/344/3
31/393
sum114/3

f ⁣(xˉ)=41434.67f\!\left(\bar{x}\right) = 4 \le \tfrac{14}{3} \approx 4.67

The gap 1434=23\tfrac{14}{3} - 4 = \tfrac{2}{3} reflects the variance: for f(x)=x2f(x) = x^2, Jensen's gap equals the variance Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2.

Proof Sketch

  1. Base case n=1n = 1: trivial, f(x1)f(x1)f(x_1) \le f(x_1).
  2. Inductive step: write i=1nwixi=w1x1+(1w1)i=2nwi1w1xi\sum_{i=1}^{n} w_i x_i = w_1 x_1 + (1-w_1)\sum_{i=2}^n \frac{w_i}{1-w_1} x_i.
  3. Apply convexity once: f(w1x1+(1w1)z)w1f(x1)+(1w1)f(z)f(w_1 x_1 + (1-w_1) z) \le w_1 f(x_1) + (1-w_1) f(z) where z=i2wi1w1xiz = \sum_{i \ge 2} \frac{w_i}{1-w_1} x_i.
  4. Apply inductive hypothesis to f(z)f(z).
  5. Mathlib's ConvexOn.map_sum_le packages this induction.

Connections

  • Convex FunctionConvex Functionf(λx+(1λ)y)λf(x)+(1λ)f(y)f(\lambda x + (1-\lambda)y) \le \lambda f(x) + (1-\lambda)f(y)A function is convex when the chord between any two points lies above the graphRead more → — Jensen's inequality is the multi-point extension of the defining two-point convexity inequality
  • AM–GM InequalityAM–GM Inequalitya1++anna1ann\frac{a_1+\cdots+a_n}{n} \geq \sqrt[n]{a_1 \cdots a_n}The arithmetic mean is always at least as large as the geometric meanRead more → — AM–GM is a special case of Jensen with f(x)=logxf(x) = -\log x (concave), reversed inequality
  • Cauchy–Schwarz InequalityCauchy–Schwarz Inequalityu,vuv|\langle u, v \rangle| \leq \|u\| \, \|v\|The inner product of two vectors is bounded by the product of their normsRead more → — Cauchy–Schwarz follows from Jensen applied to f(x)=x2f(x) = x^2 via the Cauchy–Schwarz trick
  • Markov's InequalityMarkov's InequalityP(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}A non-negative random variable rarely exceeds a multiple of its expectationRead more → — Markov's inequality pairs with Jensen in the toolkit of moment-based bounds

Lean4 Proof

import Mathlib.Analysis.Convex.Jensen
import Mathlib.Analysis.Convex.Mul

/-- Jensen's inequality for x² with three equal weights 1/3 on {1, 2, 3}.
    f(mean) ≤ mean of f: 4 ≤ 14/3. -/
theorem jensen_sq_example :
    ((1 : ) + 2 + 3) / 3 ^ 2 
    ((1 : ) ^ 2 + 2 ^ 2 + 3 ^ 2) / 3 := by
  norm_num