Gradient Descent Convergence
Statement
Let be convex and -smooth (i.e., is -Lipschitz). The gradient descent update with step :
satisfies, for the distance to minimiser :
when is additionally -strongly convex. For plain convex : objective error decreases as .
Concrete instance. , so , .
Update: — convergence in one step with step size .
With step (undershoot): , geometric convergence with ratio .
Visualization
Gradient descent on starting at , step :
Update rule: .
| 0 | 2.000 | 4.000 | 2.000 |
| 1 | 0.400 | 0.160 | 0.400 |
| 2 | 0.080 | 0.0064 | 0.080 |
| 3 | 0.016 | 0.000256 | 0.016 |
| 4 | 0.0032 | 0.0000102 | 0.0032 |
Contraction factor per step: , so .
f(x)
4 |*
|
1 | *
|
0.1 | *
| * * * ...
0 +--+--+--+--+---
0 1 2 3 4 n
The objective decreases geometrically.
Proof Sketch
- -smoothness descent lemma: .
- Convexity: .
- Combine: the squared distance to decreases by each step.
- For : , . Step : . With : .
Connections
- Convex FunctionConvex FunctionA function is convex when the chord between any two points lies above the graphRead more → — -smoothness is a quantitative strengthening of convexity; the convergence rate depends on both
- Mean Value TheoremMean Value TheoremThere exists a point where the instantaneous rate of change equals the average rate of changeRead more → — the -smoothness bound is a Lipschitz condition proved via the MVT
- Cauchy CriterionCauchy CriterionA sequence converges if and only if its terms eventually become arbitrarily close to each other — no candidate limit requiredRead more → — geometric decrease of implies the sequence is Cauchy, hence convergent
- Monotone Convergence TheoremMonotone Convergence TheoremA bounded monotone sequence of reals always converges — the supremum is the limitRead more → — the objective sequence is monotone decreasing and bounded below, so it converges
Lean4 Proof
import Mathlib.Data.Real.Basic
/-- Gradient descent on f(x) = x²/2 with step α = 1 converges in one step.
Starting at x₀, update x₁ = x₀ - 1 * x₀ = 0. -/
theorem gd_quadratic_one_step (x0 : ℝ) :
x0 - 1 * x0 = 0 := by ring
/-- Geometric convergence for f(x) = x², step α = 0.4:
x_{n+1} = x_n - 0.4 * 2 * x_n = 0.2 * x_n. -/
theorem gd_quadratic_contraction (x : ℝ) :
x - 0.4 * (2 * x) = 0.2 * x := by ring