Why Vector Search Got Rebuilt in 2025
Pinecone defined vector databases as a category in 2019 to 2022. Weaviate, Qdrant, Milvus, and Chroma followed with open-source alternatives. pgvector turned Postgres into a surprisingly capable vector store. By 2024, the space looked mature. Then Turbopuffer launched with an architecture that rethought the economics completely, and the entire space is still shaking out in 2026.
The thesis Turbopuffer pushed: in-memory vector databases (Pinecone, classic Weaviate) are wildly overpriced because they keep every vector in RAM all the time. For most workloads (semantic search, RAG, recommendation retrieval), the query patterns are sparse: 95% of users query 5% of the vectors. If you store vectors in object storage and only load what is needed, you can hit the same p95 latency at 1/10th the cost.
By mid-2025 Turbopuffer was handling 100K+ QPS at production scale for companies like Lex, Elicit, and Deta. In 2026, the pricing gap has forced Pinecone to respond with serverless tiers, and pgvector has closed several feature gaps. The comparison is worth revisiting. For broader context, see our earlier vector DB comparison.
Turbopuffer: Object-Storage-Backed Vector Search
Turbopuffer stores vectors in object storage (S3, R2, GCS) with aggressive indexing and caching. Queries land on a Rust-based service that fetches only the needed data from object storage, caches hot paths in memory, and returns results at 50 to 150ms p95 for most workloads.
Strengths: Dramatically lower cost (roughly 1/10th of Pinecone per query at high scale), elastic scaling (add vectors without provisioning), strong hybrid search (BM25 plus vector in one query), Namespaces for multi-tenancy with millions of namespaces at negligible cost, battle-tested at high QPS.
Weaknesses: Managed service only (no self-host option as of 2026), smaller ecosystem than Pinecone, cold namespace latency can be 500ms+ on first query (warms quickly after).
Pricing: $0.04 per million queries, $0.10 per GB stored per month. A 100M vector collection with 10K QPS runs roughly $300 to $800 per month depending on query patterns.
Best for: High-scale consumer apps, multi-tenant SaaS with millions of tenants, cost-sensitive RAG deployments, teams wanting the cheapest scale option.
pgvector: Postgres-Native Vector Search
pgvector turns Postgres into a vector database. You install the extension, add a vector column, create an HNSW or IVFFlat index, and query. No additional infrastructure. Benefits compound with the existing Postgres ecosystem.
Strengths: Single database for structured plus vector data, transactional ACID guarantees, mature ecosystem (migrations, backups, replication, monitoring all work), strong hybrid queries using SQL, free and open source, recent HNSW improvements approaching Pinecone performance.
Weaknesses: Scaling becomes painful past 10M to 50M vectors in a single table (insertion speed, index rebuild time, query latency all degrade), requires Postgres expertise to tune for performance, lacks vector-specific features like namespaces and metadata filtering at scale.
Pricing: Essentially free. You pay for Postgres hosting. Supabase, Neon, Aiven, Render, AWS RDS all support pgvector. For a 10M vector deployment, hosting is $50 to $400 per month on a mid-tier managed Postgres.
Best for: Startups and mid-market apps with under 10M vectors, teams already running Postgres, apps needing tight ACID with structured plus vector data, cost-sensitive deployments where self-hosting is acceptable.
Our RAG architecture guide covers where pgvector fits in the broader stack.
Pinecone: The Original Managed Vector DB
Pinecone is the enterprise incumbent. Serverless tier launched in 2024. Pod-based tier for dedicated capacity. Robust UI, SDKs, and enterprise features like RBAC, audit logs, SSO, and VPC peering.
Strengths: Most mature managed service, excellent developer experience, strong enterprise features (SOC 2, HIPAA BAA, VPC), predictable performance, well-documented SDKs for every major language, responsive support.
Weaknesses: Most expensive option at scale (pod-based pricing can run $500+ per month for moderate workloads), fewer native hybrid search primitives than Turbopuffer, namespaces have overhead that scales poorly to millions of tenants.
Pricing: Serverless: $0.25 per GB stored per month, $16 per million write operations, $16 per million read operations. Pod-based: $0.10 to $0.50 per hour per pod depending on tier.
Best for: Enterprise customers with strict compliance needs, teams that value predictable performance over cost optimization, mid-scale workloads (1M to 50M vectors), teams with budget for managed services.
Latency and Throughput Benchmarks
2026 benchmarks from internal testing on 1M vector collections (768-dim embeddings, k=10 queries):
- p50 query latency (warm cache): Turbopuffer 45ms, pgvector HNSW 12ms, Pinecone serverless 35ms, Pinecone pod 15ms.
- p99 query latency: Turbopuffer 180ms, pgvector HNSW 60ms, Pinecone serverless 150ms, Pinecone pod 40ms.
- Cold-start latency (first query to inactive namespace): Turbopuffer 400 to 800ms, pgvector N/A (always warm), Pinecone serverless 200 to 500ms, Pinecone pod N/A.
- Insertion throughput: Turbopuffer 50K inserts per second per client, pgvector 5K to 20K (depends on index), Pinecone 10K to 30K.
- Query throughput per client: Turbopuffer 10K QPS, pgvector 2K to 8K QPS, Pinecone 15K+ QPS on pods.
pgvector wins on warm-query latency when tuned. Pinecone wins on throughput. Turbopuffer wins on cost per query at scale. Benchmark on your own workload. Benchmark results shift with index tuning; default settings are not optimal for any of them.
One pattern: at under 1M vectors, all three are fast. Differences only matter past 10M vectors or high QPS.
Hybrid Search: BM25 plus Vector
Pure vector search often underperforms hybrid search (BM25 keyword relevance combined with vector semantic relevance). Your choice of vector DB affects how easy hybrid is.
Turbopuffer: Native BM25 built into the query API. Combine with vector in one query. Weight tunable. This is the cleanest hybrid search of the three.
pgvector: Postgres has built-in full-text search (tsvector, ts_rank). Combine with vector search in SQL. Not as clean as Turbopuffer's single query but flexible. You can get sophisticated with CTEs and reciprocal rank fusion.
Pinecone: Sparse plus dense hybrid search available. Requires separate sparse index. Good when properly configured but more setup than Turbopuffer.
Reciprocal rank fusion (RRF) is the standard technique for combining rankings. All three can do it with some engineering. Turbopuffer is fastest to set up.
For most production RAG apps, hybrid search improves retrieval quality by 15 to 40% over pure vector search. Worth the effort.
See our Qdrant vs Milvus vs Chroma comparison for open-source alternatives we did not cover here.
Cost Modeling for Common Workloads
Real workload cost comparisons (including storage and compute, 768-dim vectors, 1M vectors at 1K QPS average):
- Turbopuffer: About $150 to $250 per month (100 QPS average, bursts to 1K).
- pgvector on Supabase: $25 to $100 per month (on a Pro tier Postgres instance).
- Pinecone serverless: $200 to $500 per month at moderate query volume.
- Pinecone pod (p1.x1): $70 per month base, scaling to $500+ with replicas and size.
At 100M vectors, 1K QPS:
- Turbopuffer: $500 to $1,500 per month.
- pgvector: Struggles at this scale. Would need careful sharding and a beefy instance ($500 to $2,500 per month).
- Pinecone: $2,000 to $8,000 per month depending on tier.
Turbopuffer's cost advantage widens at scale. pgvector hits a wall past 10M vectors. Pinecone is predictable but expensive.
Hidden costs: ingest operations cost separately. Metadata storage and indexes add 10 to 40% overhead. Multi-region replication doubles or triples costs on all three.
Decision Framework and Migration Paths
Our decision tree:
- Under 10M vectors, already on Postgres? pgvector. Simplest, cheapest, lowest ops overhead.
- Multi-tenant SaaS with thousands to millions of tenants? Turbopuffer. Namespace overhead beats competitors at scale.
- High-scale consumer app, cost-sensitive? Turbopuffer. Unit economics are meaningfully better.
- Enterprise buyer with strict compliance? Pinecone. SOC 2, HIPAA BAA, VPC peering, audit logs.
- Need fastest p99 latency at moderate scale? pgvector or Pinecone pod.
- Hybrid search is critical? Turbopuffer has cleanest API; pgvector is flexible.
- Self-hosting required? pgvector or open-source alternatives (Qdrant, Weaviate).
Migration between these: fairly painful but doable. Common pattern is to run both in parallel during migration and shadow queries for correctness comparison. Budget 40 to 100 engineering hours for a clean migration.
Future outlook: expect more competition in 2026 to 2027. Lance, DuckDB's VSS extension, and emerging open-source options continue to close the gap. But for production stability today, Turbopuffer and pgvector cover 90% of use cases.
If you are scoping a RAG architecture and want help choosing the vector store, book a free strategy call.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.