← Blog
explainx / blog

Zvec: Alibaba's Open-Source In-Process Vector Database (2026)

Zvec is Alibaba's battle-tested, open-source in-process vector database. Searches billions of vectors in milliseconds, with no server setup, WAL persistence, hybrid search, and Python/Node.js/Flutter SDKs.

6 min readYash Thakker
Vector DatabaseAI InfrastructureRAGOpen SourceAlibaba

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Zvec: Alibaba's Open-Source In-Process Vector Database (2026)

Alibaba just open-sourced the vector database they've been running in production. Zvec — an in-process, embedded vector database — landed on GitHub in late 2025 and has already reached 9.9k stars. The v0.5.0 release dropped in June 2026, adding WAL guarantees, libaio support, and prefetch configuration for tuning search latency.

The pitch is straightforward: the speed of a native library, the durability of a real database, and zero server setup. If you've ever added Chroma or Qdrant to a project just to get local vector search and felt like it was too much infrastructure for what you needed — Zvec is the answer.


What Is Zvec?

Zvec is an in-process vector database — a library that runs inside your application rather than as a separate server. You import it, open a collection (a directory on disk), insert vectors, and query them. There is no daemon to start, no port to configure, no connection pooling to manage.

This puts it in the same category as SQLite, not Postgres. The analogy is intentional: just as SQLite gave developers a production-grade relational database they could embed in any app, Zvec gives developers a production-grade vector database with the same zero-infrastructure profile.

Alibaba has been running it internally across multiple production workloads before open-sourcing it, which means the battle-testing has already happened at serious scale.


Why In-Process Matters

Most popular vector databases (Qdrant, Weaviate, Chroma in server mode, Milvus) require running a separate process. That's fine for teams with dedicated infra, but it creates real friction for:

  • Local development — you need Docker or a running daemon before writing a single line of retrieval code
  • Edge / mobile — shipping a server process to Android or iOS is not practical
  • CLI tools and notebooks — standing up infrastructure for a script is over-engineering
  • Serverless functions — ephemeral environments make external connections expensive

Zvec eliminates all of these friction points. Since it's just a library, it starts the instant your process starts and shuts down cleanly when your process exits.

For context on where in-process vector search fits in the broader RAG landscape, see our comparison of RAG vs MCP for context-aware AI systems.


Core Features

Dense + Sparse Vector Support

Zvec handles both dense vectors (typical embeddings from models like text-embedding-3) and sparse vectors (BM25-style term weights from models like SPLADE). Both can be queried in a single call, enabling true hybrid retrieval without a separate keyword search layer.

Hybrid Search with Filters

Combine semantic similarity with structured filters in one query. This matters for most real-world retrieval tasks where you want "most semantically similar to X" combined with "must have field Y = value Z."

WAL Persistence

Write-ahead logging means every insert is durable before it returns. Zvec survives process crashes and power failures without data loss — which separates it from pure in-memory libraries like FAISS.

Multi-Process Read Concurrency

Multiple processes can open the same Zvec collection simultaneously for reads. Only writes require exclusive single-process access. This makes it viable for multi-worker serving scenarios.

Indexing: HNSW and DiskANN

Zvec supports HNSW (the standard graph-based ANN algorithm) and DiskANN (Microsoft's disk-based index that enables billion-scale search from commodity hardware with limited RAM). DiskANN support was added in the v0.4.x cycle.


Quick Start

import zvec

# Define schema
schema = zvec.CollectionSchema(
    name="docs",
    vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 1536),
)

# Open (creates if not exists)
collection = zvec.create_and_open(path="./my_collection", schema=schema)

# Insert
collection.insert([
    zvec.Doc(id="doc_1", vectors={"embedding": embedding_1}, fields={"category": "tech"}),
    zvec.Doc(id="doc_2", vectors={"embedding": embedding_2}, fields={"category": "finance"}),
])

# Query with filter
results = collection.query(
    zvec.VectorQuery("embedding", vector=query_embedding),
    topk=10,
    filters={"category": "tech"},
)

print(results)  # [{'id': 'doc_1', 'score': 0.94, ...}, ...]

Node.js is also supported via npm install @zvec/zvec — the API mirrors the Python SDK.


What's New in Recent Releases

v0.5.0 (June 2026)

  • libaio support for Linux async I/O (lower read latency on DiskANN workloads)
  • Prefetch configuration exposed as search params (PO and PL) for tuning I/O prefetch depth during ANN search
  • Further compiler warning fixes with -Werror across all CI platforms

v0.4.0 (May 2026)

  • Dart/Flutter SDK: official package with FFI bindings for Android (arm64-v8a) and iOS (arm64) — no manual native compilation required
  • iOS build support: expanding cross-platform coverage to Apple mobile
  • Enlarged topK limit: relaxed the upper bound for larger recall scenarios
  • Bug fixes: SQ8 quantizer recall drop, Windows path handling, sparse vector index ordering

Earlier milestones

  • DiskANN index support (v0.3.x era)
  • Full-text search (FTS) support
  • SQ8 scalar quantization

How It Compares

ZvecFAISSChromaQdrant
In-processYesYesOptionalNo
Persistence (WAL)YesNoYesYes
Hybrid searchYesNoLimitedYes
Sparse vectorsYesNoNoYes
Mobile supportYes (Flutter)NoNoNo
Multi-process readsYesNoNoN/A
DiskANNYesNoNoNo
Setup requiredNoneNoneServer optionalServer required

For a deeper dive into where vector search fits vs. keyword retrieval in code contexts, see RAG vs Agentic RAG: why search beats embeddings for code retrieval.


Performance

Alibaba benchmarks Zvec against billion-scale datasets. The headline claims: searches across billions of vectors in milliseconds with the HNSW index, and larger-scale recall with DiskANN. The project publishes full benchmark methodology, configurations, and results at their docs site.

For comparison, Google's own vector compression work (TurboVec/TurboQuant) recently demonstrated compressing 10M vectors from 31GB to 4GB — a different angle on the same infrastructure problem of making vector search practical at scale. See our coverage of Google TurboVec and TurboQuant for how the tradeoffs compare.


Who Should Use Zvec

Good fit:

  • Python or Node.js apps that need local vector search without infrastructure overhead
  • RAG pipelines in notebooks, scripts, or serverless functions
  • Mobile apps (Flutter/React Native with FFI) that need on-device semantic search
  • Production services where you want embedded, not networked, vector retrieval
  • Teams who want DiskANN's disk-based billion-scale search without Milvus's operational complexity

Less ideal:

  • Multi-tenant SaaS needing shared vector infrastructure across many isolated users
  • Workloads requiring horizontal write scaling across many nodes
  • Teams already invested in a managed vector database with cloud-native features

Getting It

# Python
pip install zvec

# Node.js
npm install @zvec/zvec

Source is at github.com/alibaba/zvec under Apache 2.0. The project has 27 contributors, 8 releases, and active CI across Linux, macOS, Windows, Android, and iOS.


Zvec is the kind of infrastructure release that quietly becomes load-bearing in a lot of projects. In-process vector search with WAL durability, hybrid filtering, and mobile support fills a real gap — and the Alibaba production provenance means you're not betting on an untested project.

Related posts