SRAL: A Framework for Evaluating Agentic AI Architectures

Sharan, Aakash

The Write Path in Vector Databases (It’s a Distributed Systems Problem)

November 24, 2025 #AI Infrastructure 6 min read

(Where Dense Search Becomes a Distributed Systems Problem)
Most content about vector databases focuses on the glamorous part: fast queries, clever indexing, tight cosine similarity loops.
But if you operate these systems in production, you learn something uncomfortable:

Your system’s correctness, performance, and scalability are defined far more by the write path than by the read path. Dense vector search looks like a retrieval problem. In reality, it’s a distributed systems problem wearing an information-retrieval disguise. This post breaks down how different indexes actually handle writes, why reindexing is so expensive, and what architects must plan for long before deploying a vector DB into RAG, agent workflows, or search systems.

1. Why the Write Path Is the Real System Bottleneck

Every time you insert a new vector, three things must happen:

The index must place it correctly in the ANN structure.
The system must update search metadata so routing and lookups remain correct.
The system must preserve recall guarantees despite the new insertion.

These sound simple. They aren’t.
Different ANN structures handle writes in completely different ways — and their behavior under load defines the architecture around them.
Let’s walk through what actually happens under the hood.

2. HNSW Writes: Modifying a Live Graph

HNSW isn’t a tree, or a cluster, or a table. It’s a multi-layer graph with strict structural invariants. When you insert a single vector, HNSW must:

connect it to its nearest neighbors
place it in a hierarchy where upper layers are sparse and lower layers dense
ensure no layer becomes disconnected
update multiple candidate lists along the path

This is not a constant-time operation. It’s not even predictable.

Why HNSW Writes Are Slow

Because the graph must maintain high recall, every write involves:

repeated distance computations
adjusting edges
preserving graph navigability
ensuring top-level shortcuts remain meaningful

If you insert too quickly, the graph “churns” — quality drops, recall drops, and the system starts reorganizing itself behind the scenes.

This is why real-world HNSW deployments almost always adopt the same pattern:
HNSW is best served as a mostly-static graph, requiring separate ingestion lanes and periodic full merges to manage churn. It’s the only way to avoid corrupting recall under sustained write pressure.

3. IVF Writes: Where Predictability Comes From

IVF takes the opposite approach. Instead of maintaining a global structure, IVF says:

“We’re going to cluster the vector space once…”
“…and every new vector goes to the nearest centroid.”

So a write is:

Embed vector
Find nearest centroid (K comparisons)
Append to that cluster
Update local metadata

That’s it. No global edges, no multi-level rebalancing, no graph mutation.

Why IVF Writes Are Fast

Appending to a cluster is structurally simple. This makes IVF the backbone of distributed vector databases like Pinecone and Weaviate. Writes scale horizontally, shards map directly to clusters, and routing stays predictable.

The tradeoff is cluster drift — if your embedding distribution changes enough, vectors that used to belong together don't anymore. Think of it like sorting emails into folders based on keywords ten years ago. If the underlying vocabulary or product line has fundamentally shifted, the old folders are now a poor map of the new incoming vectors.
And when that happens, recall quietly drops… until you re-cluster.

4. PQ/OPQ Writes: The Most Expensive Pipeline in Vector Search

If HNSW is slow because it maintains a graph, PQ is heavy because it maintains a compression model.

A PQ write looks like:

Embed vector
Split into sub-vectors
For each sub-vector, find the nearest “prototype” in its codebook
Convert that into an integer code
Store those codes on SSD in a compressed layout
Update metadata for decompression, re-ranking, and refinement

If you’re using OPQ, there’s an extra step:

rotate the vector using a learned transformation matrix before quantizing

This single matrix multiply adds substantial cost per write.

Why PQ Writes Hurt

Codebook lookup = expensive
Quantization = lossy
Storage layout = SSD-sensitive
Updating codebooks = expensive
Maintaining recall = tricky

Which is why PQ is rarely used for ❌ real-time ingestion ❌ high-write systems ❌ rapidly evolving datasets.
But it is perfect for: ✔ billion-scale catalogs ✔ archive systems ✔ low-write, high-read workloads where memory footprint is the primary constraint.
PQ is a power tool — but not a general-purpose ingestion engine.

5. Why Reindexing Is a Cluster-Wide Event (Not Maintenance)

Reindexing sounds simple:
“Let’s rebuild the index with the new embeddings.”

In a dense vector system, this is the equivalent of saying:
“Let’s rebuild the entire distributed search engine, all routing metadata, all clusters, all centroids, and all replicas.”
Furthermore, the index build itself is a massive, resource-intensive compute job that can saturate CPU and memory across an entire cluster.

Reindexing happens when:

You change the embedding model
New model = new geometry.
All distances change.
Cluster boundaries change.
Graph edges become invalid.
Codebooks stop matching.
Everything breaks, so everything must be rebuilt.
You change dimensionality
Moving from 768 → 1536 dims means:
- different memory profile
- different distance distribution
- different quantization shape
  Again: rebuild.
Your domain changes (vector drift)
Embeddings drift naturally as:
- product lines change
- vocabulary shifts
- user-generated content evolves
  This breaks clusters and codebooks over time.
You need better recall or lower latency
Sometimes you must:
- increase K (cluster count)
- adjust HNSW link factor (M)
- recompute codebooks
  Every one of these implies a rebuild.

6. Why You Cannot Reindex In Place

A production vector DB must behave like a distributed search engine: You need:

Versioned indexes (v1, v2, v3…)
Dual-write (ingest into both indexes)
Dual-read (compare quality, run shadow queries)
Background rebuilds (no downtime)
Cutovers (switch traffic atomically)
Rollback paths (if v2 performs worse)

This is almost identical to:

Lucene reindexing
search cluster migrations
schema-level DB migrations

Except vector DBs have:

heavier compute
more sensitive distribution
larger data volumes
more complex routing metadata

Reindexing is not an ML operation. It is a full distributed systems migration.

7. The Long-Tail Problem: Where Quality Quietly Degrades

Even when your writes “work,” the system accumulates small structural imperfections from inserts, updates, and deletes:

HNSW edges degrade
IVF clusters drift
PQ codebooks become stale
Routing becomes less accurate
Shards grow unevenly
Replicas diverge

Updates and deletes rarely happen in-place; they often require expensive marker flags, increasing read latency, and necessitating periodic compaction or full index rebuilds. This degradation is slow, quiet, and dangerous.

A healthy vector system must run:

background refinement jobs
centroid realignment
codebook drift correction
shard rebalancing
compaction tasks
ingestion burst smoothing

These tasks keep search quality predictable.

Final Takeaway

Search is easy. Writing is hard. If you ignore write-path design, you don’t have a vector architecture — you have a vector demo. A production vector database must treat the write path as a first-class architectural layer, because:

graphs mutate unpredictably (HNSW)
clusters drift (IVF)
codebooks decay (PQ/OPQ)
embeddings shift (drift + new models)
routing depends on metadata that must stay consistent

Dense vector search begins with embeddings… but it succeeds or fails on the write path.