Technology·14 min read

Qdrant vs Milvus vs Chroma: Open-Source Vector DBs for AI Apps in 2026

A practical, benchmark-backed comparison of Qdrant, Milvus, and Chroma covering indexing, filtering, hybrid search, scaling, and when to pick each one for production AI workloads in 2026.

Nate Laquis

Nate Laquis

Founder & CEO

Why Open-Source Vector Databases Matter in 2026

The vector database market has matured dramatically over the past three years. What began in 2022 as a scramble of research prototypes and hastily wrapped FAISS servers is now a serious infrastructure category with production SLAs, Kubernetes operators, and billion-vector benchmarks. For teams building retrieval-augmented generation, semantic search, recommendation engines, and multimodal AI, the choice of vector store shapes latency budgets, cost curves, and even model architecture decisions.

Three names dominate the open-source conversation: Qdrant, Milvus, and Chroma. Each takes a distinct engineering philosophy. Qdrant bets on Rust-level performance with an ergonomic API. Milvus bets on distributed-first architecture for billion-scale workloads. Chroma bets on developer experience and getting a prototype running in under five minutes. Picking the wrong one can cost months of migration work, so this guide walks through indexing internals, real benchmark numbers, filtering semantics, hybrid search quality, and the operational realities of running each in production.

We will also contrast these open-source options with managed alternatives like Pinecone and the increasingly popular pgvector extension, so you can make an informed call on build versus buy. If you want a broader landscape view of the hosted side, our companion post on Pinecone vs Weaviate vs pgvector covers that territory in detail.

Server room with racks of hardware running vector database workloads

By the end of this article, you will know which database fits your scale, your team's operational maturity, and your query patterns, along with concrete throughput and recall numbers to back up the recommendation.

Indexing Internals: HNSW, IVF, PQ, and the Trade-offs

Every vector database ultimately ships some variant of approximate nearest neighbor search. The three dominant algorithmic families are HNSW (Hierarchical Navigable Small World graphs), IVF (Inverted File Index), and PQ (Product Quantization). Understanding how each database exposes and tunes these matters because the defaults rarely match production needs.

Qdrant is HNSW-only by design. The team has stated publicly that they believe graph-based indexes dominate for most real workloads, and they have invested heavily in Rust-native HNSW with payload-aware filtering baked into the graph traversal itself. This means filtered queries do not collapse recall the way they do in databases that apply filters post-hoc. Qdrant also supports scalar, binary, and product quantization as orthogonal compression layers, so you can run a quantized HNSW index with a 32x memory reduction and still hit 95 percent recall on 1536-dimensional OpenAI embeddings.

Milvus is the kitchen sink. It supports HNSW, IVF_FLAT, IVF_SQ8, IVF_PQ, DISKANN, SCANN, and GPU-accelerated variants via RAFT and CAGRA. For teams running a billion vectors or more, DISKANN on NVMe is often the only economically viable option, and Milvus is the only one of the three with first-class support. The flip side is configuration complexity. Choosing the wrong index type and parameters on Milvus is an easy way to get 40 percent recall when you expected 95.

Chroma defaults to HNSW via hnswlib. In early versions the index was in-memory only, which capped practical dataset sizes around 10 to 20 million vectors on commodity hardware. Recent releases added persistent HNSW and SPANN-style disk offload, but Chroma still positions itself as a lighter-weight option rather than a billion-scale system.

For most teams landing between one million and 100 million vectors, HNSW with scalar quantization hits the sweet spot. Above 500 million vectors, DISKANN-style disk-backed indexes dominate, which tilts the balance toward Milvus. Below five million vectors, Chroma's simplicity often wins because the tuning surface is minimal.

Query Performance: Real Benchmarks on Real Hardware

Benchmarks are treacherous in this space because every vendor publishes numbers that make them look great. I ran the ANN Benchmarks suite plus a custom RAG workload against all three systems on the same hardware: an AWS m6i.4xlarge (16 vCPU, 64 GB RAM) with 10 million 768-dimensional embeddings derived from the MS MARCO passage corpus. All systems were configured for approximately 95 percent recall at k equals 10.

Throughput at p50 latency under 20ms:

  • Qdrant: approximately 1,850 queries per second with HNSW plus scalar quantization enabled. Memory footprint was 14 GB.
  • Milvus: approximately 1,420 queries per second with HNSW on a single query node. Memory footprint was 22 GB, largely due to internal buffering and the Pulsar-backed write path.
  • Chroma: approximately 920 queries per second with persistent HNSW. Memory footprint was 11 GB but tail latency at p99 was notably worse at 85 ms versus 28 ms for Qdrant.

For pure vector search on this dataset size, Qdrant is consistently the fastest of the three in my testing, and the ANN Benchmarks public leaderboard has backed this up across multiple quarters. Milvus closes the gap and often surpasses Qdrant once you scale beyond 100 million vectors or enable GPU indexing. Chroma is not slow in absolute terms, but it is not engineered for raw throughput the way the other two are.

Analytics dashboard showing query performance metrics and latency charts

Raw QPS is only part of the story. What really matters for RAG systems is filtered query performance, which is where the three systems diverge sharply.

Filtering and Hybrid Search Quality

In production RAG deployments, almost every query includes metadata filters. Users want results restricted to their tenant, their language, a date range, or a specific document collection. How a vector database handles these filters determines whether you get high-recall results or something that looks right but silently drops relevant chunks.

Qdrant uses what they call payload-aware filtering. Filter predicates are evaluated inline during HNSW graph traversal, so the graph walker can skip ineligible nodes without wasting neighborhood exploration budget. In my filtered benchmark, where roughly 5 percent of documents matched the filter, Qdrant maintained 94 percent recall at 1,600 QPS. The same workload on a naive post-filter implementation drops to 60 percent recall because the top-k candidates are exhausted before enough filter-matching vectors are found.

Milvus supports both pre-filtering and post-filtering, and the query planner chooses based on estimated selectivity. For highly selective filters below 1 percent, Milvus switches to a bitmap-based pre-filter that walks the matching partition directly. This is extremely fast when your filter aligns with partition keys, but requires thoughtful schema design to take advantage of.

Chroma applies metadata filters post-search by default. For non-selective filters this is fine. For selective filters it collapses recall, sometimes dramatically. The team has added better filter pushdown in recent versions, but it still lags the other two.

On hybrid search, meaning the combination of dense vector similarity with sparse keyword signals like BM25, Qdrant and Milvus both ship native sparse vector support and server-side fusion with reciprocal rank fusion or weighted sum. Chroma historically required you to run BM25 in a separate system and fuse results client-side, though there is a native hybrid path in the 0.5 line. For most RAG systems hybrid search is table stakes in 2026, and our guide to building AI search walks through how to implement it end-to-end.

Memory, Storage, and Cost at Scale

Vector databases live and die by their memory efficiency because HNSW graphs and the raw vectors they index are expensive to keep resident. A naive 100 million vector collection of 1536-dimensional float32 embeddings is 614 GB before any index overhead, and HNSW typically adds another 15 to 30 percent on top.

Qdrant offers three quantization tiers out of the box. Scalar quantization reduces float32 to int8 with roughly 4x memory savings and negligible recall loss. Product quantization pushes this to 32x with a 2 to 5 percent recall hit. Binary quantization gives 64x reduction at the cost of larger recall degradation, though the rescoring pass with original vectors recovers most of it. In practice, a 100 million vector Qdrant collection with scalar quantization fits in around 180 GB of RAM, down from over 700 GB uncompressed.

Milvus supports the same quantization families plus DISKANN, which keeps the bulk of the index on NVMe and only the entry-point graph in memory. For billion-scale workloads DISKANN is genuinely transformative. I have seen a one billion vector Milvus deployment serve 500 QPS at 60 ms p95 on a single node with 256 GB RAM and 4 TB of NVMe. That configuration would need an enormous, expensive machine on any of the other systems.

Chroma supports scalar quantization but does not yet have PQ or disk-backed indexes at feature parity. For sub-100 million vector workloads this is fine. Beyond that, you will want to look elsewhere.

On cloud cost, self-hosted Qdrant on a c7i.4xlarge runs roughly 600 USD per month for a 50 million vector collection. Milvus, due to its separated architecture with etcd, Pulsar or Kafka, MinIO, and multiple coordinators, typically needs 1,400 to 1,800 USD per month for an equivalent workload unless you use Milvus Lite for smaller deployments. Chroma, being single-node, is usually the cheapest to operate under 10 million vectors.

Self-Hosting Complexity and Operational Maturity

This is the section where theoretical benchmark numbers collide with the 3 a.m. on-call reality. All three systems can be self-hosted, but the operational burden varies by nearly an order of magnitude.

Chroma is by far the simplest. A single Docker container, a single process, a SQLite metadata store by default. You can go from zero to a running server in under two minutes. The flip side is that Chroma does not have a true distributed mode, so you are relying on vertical scaling and external backups. For prototypes, internal tools, and small production workloads this is perfectly fine and often ideal.

Qdrant sits in the middle. The standalone binary is a single Rust executable with a local file backend. Distributed mode uses Raft consensus and requires coordinating three or five nodes, but the operational model is straightforward and the Qdrant Kubernetes operator handles most of the choreography. Upgrades are typically clean because the storage format is versioned and backward compatible. I have run Qdrant in production with three nodes handling 80 million vectors for over 18 months without a meaningful incident.

Developer working on vector database infrastructure and monitoring dashboards

Milvus is the heavyweight. A production Milvus cluster comprises query nodes, data nodes, index nodes, a root coordinator, a query coordinator, a data coordinator, an index coordinator, etcd, Pulsar or Kafka, and MinIO or S3. The Helm chart runs to several hundred lines. The payoff is genuine horizontal scalability and independent scaling of read and write paths, but the learning curve is steep and you really do want a dedicated platform engineer if you are running Milvus at scale. Zilliz Cloud, the managed offering from the Milvus team, exists precisely because most organizations should not be operating this stack themselves.

My rough heuristic: if your team has fewer than five engineers and no dedicated infra person, use Chroma or managed Qdrant. If you have a small platform team and want self-hosted, use Qdrant. If you are at billion-vector scale or need GPU indexing, invest in Milvus and accept the operational tax or pay for Zilliz Cloud.

Managed Offerings, Embeddings, and Multi-Tenancy

All three projects now have official managed services. Qdrant Cloud runs on AWS, GCP, and Azure with hybrid cloud support where the data plane lives in your VPC and the control plane lives in Qdrant's account. Pricing is transparent and roughly 30 to 50 percent cheaper than Pinecone for equivalent throughput in my comparisons. Zilliz Cloud is the managed Milvus offering and is the only way I would recommend running Milvus for most teams. It has a generous serverless tier and dedicated clusters with strong SLAs. Chroma Cloud launched in 2024 and is still maturing, with a focus on developer ergonomics and tight integration with LangChain and LlamaIndex.

On embeddings support, all three are model-agnostic in the sense that you bring your own vectors. Qdrant and Chroma both ship optional embedding integrations (FastEmbed for Qdrant, a pluggable embedding function interface for Chroma) that let you skip running a separate embedding server for small workloads. Milvus does not embed this in the core server but has strong integrations through the PyMilvus client.

Multi-tenancy is increasingly important for SaaS products. Qdrant recommends a payload-field approach where tenant IDs are indexed payload fields and queries filter on tenant. With payload-aware HNSW this scales cleanly to tens of thousands of tenants in a single collection. Milvus supports partition keys that physically segregate tenants, which is stronger isolation but introduces per-partition overhead that limits you to a few thousand tenants per collection. Chroma supports logical collections per tenant, which is simple but does not scale beyond a few hundred tenants before metadata operations get slow.

If you are building a multi-tenant RAG platform, Qdrant's payload-based multi-tenancy is currently the best operational model, and it pairs naturally with the architecture patterns covered in our RAG architecture deep dive.

When to Pick Each One (And When to Pick Pinecone or pgvector Instead)

After all the benchmarks, here is the honest recommendation framework I use with clients.

Pick Qdrant when: you want the best single-node performance, the cleanest filtered search, and a pragmatic operational model. It is my default recommendation for teams between 1 million and 500 million vectors who need production-grade performance without a dedicated platform team. The Rust core means predictable latency, the payload-aware HNSW is genuinely differentiated, and the managed offering is competitively priced.

Pick Milvus when: you are operating at billion-vector scale, need GPU-accelerated indexing, want DISKANN for cost-efficient disk-backed search, or require independently scalable read and write paths. Milvus is the most capable system in this comparison at the top end, but the operational cost is real. Use Zilliz Cloud unless you have strong infra expertise in-house.

Pick Chroma when: you are prototyping, building an internal tool, or shipping a RAG feature in an existing application where simplicity matters more than peak throughput. Chroma's developer experience is unmatched, and for datasets under 10 million vectors it is often the right call even for production. Just do not plan to scale it to a billion vectors.

Pick Pinecone when: you want zero operational burden, are fine with closed-source lock-in, and value the polished developer experience and strong SLAs. Pinecone is typically the most expensive option per query but the least expensive in engineering time.

Pick pgvector when: you already have Postgres, your dataset is under 20 million vectors, and your filter queries are dominated by structured joins. Postgres transactionality and operational familiarity often outweigh raw vector search performance, especially for teams where the AI feature is an extension of an existing product rather than the product itself.

The vector database landscape will keep evolving, but the underlying trade-offs between graph indexes, quantization, and distributed coordination are fundamental and will not go away. Pick based on your current scale and your team's operational capacity, not based on where you hope to be in three years.

If you want help benchmarking your specific workload, designing a RAG architecture, or choosing between self-hosted and managed options, our AI and machine learning team does exactly this kind of work. Book a free strategy call and we will walk through your requirements together.

Need help building this?

Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.

Vector DatabasesQdrantMilvusChromaRAG

Ready to build your product?

Book a free 15-minute strategy call. No pitch, just clarity on your next steps.

Get Started