Hierarchical Navigable Small World (HNSW)

Premier
graph-theoryalgorithmsnearest-neighborvisualizationhnsw
Q(N)=O(logN)Q(N) = O(\log N)

Statement

HNSW (Malkov & Yashunin, 2016) is the dominant index for approximate nearest-neighbour (ANN) search over high-dimensional vectors — the backbone of vector databases, semantic search, and retrieval-augmented generation. Given a set of NN points and a query qq, it returns the (approximate) closest point by traversing a graph instead of scanning all NN vectors, achieving an expected query cost of

Q(N)=O(logN)Q(N) = O(\log N)

The structure is a stack of proximity graphs. Layer 00 contains every point; each higher layer keeps an exponentially thinner random sample. Searching starts at a single entry point in the top layer, greedily walks to the neighbour nearest the query, drops to that same node one layer down, and repeats — coarse long hops up top, fine local refinement at the bottom. It is the graph-theoretic analogue of a skip list.

How it works

HNSW combines two ideas:

  • Navigable small-world (NSW) graphs — proximity graphs that mix short local links with occasional long-range links, so greedy routing reaches any target in a polylogarithmic number of hops (Kleinberg's small-world result).
  • Hierarchy by exponential sampling — separating links into layers by length scale removes the polylog factor that plagues a flat NSW graph, turning search into O(logN)O(\log N).

Grounding: semantic search

To make the query concrete, the interactive figure below indexes a toy semantic embedding — about 28 words placed in four meaning-clusters (animals, fruits, vehicles, space) — so nearest-neighbour search means "given a concept, find the closest word". Pick an example query like wolf, lime, or van and watch the search greedily descend the layer stack and settle inside the right cluster (a ✓ appears when it lands in the expected neighbourhood). The drone (boundary) query sits deliberately between clusters to show the approximate nature of the result — the returned point is ringed in gold, and if it differs from the true nearest neighbour that one is ringed with a dashed green circle. Switch the dataset to Random for the purely abstract view, toggle Stack 2.5D / Flat layer, and step through the walk.

This is exactly the motion behind vector search and retrieval-augmented generation (RAG): an embedding model turns text into vectors, and HNSW retrieves the nearest stored vectors to a query embedding in logarithmic time instead of scanning the whole corpus.

Level assignment

Each inserted node is given a maximum level drawn from a geometric distribution:

l=ln(u)mL,uUniform(0,1)l = \left\lfloor -\ln(u) \cdot m_L \right\rfloor, \qquad u \sim \mathrm{Uniform}(0,1)

The normalization constant mLm_L controls how fast layers thin out. The choice that minimizes overlap between layers (and the expected number of hops) is

mL=1lnMm_L = \frac{1}{\ln M}

where MM is the number of neighbours kept per node. A node at level ll participates in layers 0,1,,l0, 1, \dots, l. Because levels decay geometrically, the expected number of layers is O(logN)O(\log N) and the top layers stay sparse.

Search

Searching for the nearest neighbour of qq runs in two phases:

  1. Greedy descent (layers L1L \to 1). Start at the global entry point. At each layer, repeatedly move to whichever neighbour is closer to qq until no neighbour improves; then descend to the next layer keeping the current best node as the new start.
  2. Beam search at layer 00. Run a best-first search of width ef\mathit{ef} (the size of the dynamic candidate list). Larger ef\mathit{ef} explores more of the base graph, trading speed for recall.

The greedy walk turns the global search into a sequence of local decisions; the only state carried between layers is the single best node found so far.

Complexity

QuantityCost
Expected query timeO(logN)O(\log N)
Expected insertion timeO(logN)O(\log N)
MemoryO(NM)O(N \cdot M)
Layers (expected)O(logN)O(\log N)

The key tunables are MM (graph degree — higher means better recall and more memory), ef_construction\mathit{ef\_construction} (build-time candidate list — better graph quality), and ef_search\mathit{ef\_search} (query-time candidate list — recall vs. latency). The visualization exposes NN, MM, mLm_L, and the search ef\mathit{ef} directly.

Connections

HNSW sits at the intersection of small-world network theory and proximity-graph search. The hierarchy mirrors the skip list: probabilistic level assignment giving expected-logarithmic traversal, lifted from a linked list onto a graph. The base-layer beam search is a graph-restricted variant of best-first / kk-nearest-neighbour search. The construction here builds each layer as a symmetric MM-nearest-neighbour graph for clarity; production implementations (e.g. FAISS, hnswlib, Qdrant) instead insert nodes incrementally with a neighbour-selection heuristic that prunes redundant long edges to keep the graph navigable.

References

  • Malkov, Yu. A. & Yashunin, D. A. (2016). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:1603.09320 (IEEE TPAMI 2020).
  • Malkov, Yu. et al. (2014). Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems.
  • Kleinberg, J. (2000). The small-world phenomenon: an algorithmic perspective. STOC.
  • Hierarchical navigable small world — Wikipedia

Interactive HNSW Search

A toy semantic dataset: ~28 words laid out in four meaning-clusters (animals, fruits, vehicles, space). Pick an example query — wolf, lime, van — and watch the search greedily descend the layer stack and land on the nearest word. Higher layers keep a thinner sample for long-range hops; layer 0 holds every point for fine search. Switch to Random for the abstract view; drag to orbit, scroll to zoom.