Technology·14 min read

Pinecone vs Weaviate vs pgvector: Vector Databases for AI Apps

Every RAG pipeline and AI search app needs a vector database, but the right choice depends on your scale, budget, and ops capacity. Here is an honest breakdown of Pinecone, Weaviate, and pgvector with real costs, latency benchmarks, and migration advice.

N

Nate Laquis

Founder & CEO ·

Why Every RAG and AI Search App Needs a Vector Database

If you are building anything that involves retrieval-augmented generation, semantic search, or recommendation engines, you need a place to store and query embeddings. A traditional relational database can technically store vectors, but querying them with approximate nearest neighbor (ANN) search at sub-100ms latency across millions of records is a different problem entirely. That is what vector databases are built for.

The core workflow is straightforward. You generate embeddings from your documents using a model like OpenAI text-embedding-3-small or Cohere embed-v3 (see our embedding models comparison for a full breakdown). Those embeddings, typically 768 to 3072 dimensional float arrays, get stored in a vector database with associated metadata. At query time, you embed the user question, run an ANN search to find the top-k most similar vectors, then pass those retrieved chunks to your LLM as context. The vector database is the backbone of that retrieval step.

Data center servers representing vector database infrastructure for AI applications

The market has exploded with options. Pinecone, Weaviate, pgvector, Qdrant, Milvus, Chroma, LanceDB. Each makes trade-offs between managed convenience, cost, performance, and flexibility. After deploying vector search systems for dozens of production apps, we have strong opinions on which tool fits which situation. This guide focuses on the three we recommend most often: Pinecone, Weaviate, and pgvector. We will also cover Qdrant and Milvus as alternatives worth considering.

Pinecone: The Easiest Path to Production Vector Search

Pinecone is a fully managed, cloud-native vector database. You do not deploy anything. You do not manage infrastructure. You create an index via their API, upsert vectors, and query. That simplicity is its biggest selling point, and for many teams, it is the right trade-off.

On pricing, Pinecone offers a serverless tier that starts at $0 for up to 2GB of storage and around 100K vectors at 1536 dimensions. That is genuinely useful for prototyping and small production workloads. Beyond that, serverless pricing is consumption-based: roughly $0.33 per 1M read units and $2 per GB of storage per month. For teams that need dedicated throughput, pod-based plans start at around $70/month for a single s1.x1 pod. A production deployment with 5M vectors and moderate query volume typically runs $100 to $300/month on serverless, or $280 to $700/month on pods depending on replica configuration.

Where Pinecone excels: onboarding speed. A developer can go from zero to a working semantic search endpoint in under an hour. The Python and Node SDKs are polished, the documentation is thorough, and features like metadata filtering, namespaces, and sparse-dense hybrid search work out of the box. If you are building a RAG architecture and just need retrieval to work reliably without thinking about infrastructure, Pinecone is the path of least resistance.

Where Pinecone falls short: vendor lock-in is real. Your data lives in Pinecone proprietary infrastructure. There is no self-hosted option, no way to export your indexes to another system without re-indexing, and no escape hatch if pricing changes or the service has an outage. We have seen teams get stuck on Pinecone when they needed features like custom distance metrics or tighter integration with their existing data pipeline. The serverless cold-start latency can also spike to 200ms+ for infrequently accessed namespaces, which matters for latency-sensitive applications.

Weaviate: Open-Source Power with Hybrid Search Built In

Weaviate is an open-source vector database written in Go that you can self-host on your own infrastructure or run via Weaviate Cloud. It occupies a sweet spot between the simplicity of a managed service and the control of running your own stack.

The standout feature is native hybrid search. Weaviate combines dense vector search (semantic similarity) with BM25 sparse keyword search in a single query, automatically fusing the results. This matters because pure vector search has a well-known weakness: it can miss exact keyword matches that a user clearly intended. If someone searches for "error code PG-4012," a vector search might return generically similar error documentation. Weaviate hybrid search will find the exact match and rank it appropriately. For production search systems, this is table stakes, and Weaviate handles it natively rather than requiring you to bolt on a separate keyword index.

Server room infrastructure for self-hosted vector database deployments

On pricing, self-hosting Weaviate is free (it is BSD-3 licensed). You pay only for the compute. A reasonable production setup on AWS runs on a single r6g.xlarge instance (4 vCPU, 32GB RAM) at roughly $150/month, handling 5-10M vectors with good performance. Weaviate Cloud starts at $25/month for a sandbox, with production tiers at $135/month and up. The cloud option saves you the ops burden of managing Kubernetes deployments, backups, and upgrades.

Weaviate also supports multi-tenancy natively, which is critical if you are building a SaaS product where each customer needs isolated vector data. It ships with built-in vectorization modules for OpenAI, Cohere, Hugging Face, and others, meaning you can skip the "generate embeddings separately" step entirely and let Weaviate handle it on ingest. The GraphQL API is flexible, though some developers find it verbose compared to a simple REST interface.

The downsides: self-hosting requires Kubernetes knowledge and ongoing maintenance. Weaviate memory consumption can be aggressive. Plan for roughly 2x the raw vector data size in RAM for HNSW indexes. Upgrades between major versions have historically required careful migration planning, though this has improved significantly in recent releases.

pgvector: Zero New Infrastructure, Serious Trade-Offs at Scale

pgvector is a PostgreSQL extension that adds vector storage and ANN search to the database you are probably already running. No new service, no new SDK, no new deployment pipeline. You add the extension, create a column with the vector type, build an index, and query with SQL. For teams that want to avoid introducing another piece of infrastructure, pgvector is extremely compelling.

The cost model is unbeatable: it is free, open-source, and runs on whatever Postgres instance you already have. If you are on RDS, Supabase, Neon, or any managed Postgres provider, you can enable pgvector with a single command. There is no additional cost beyond the compute and storage you are already paying for. A team running a db.r6g.large RDS instance at ~$130/month can store and query millions of low-dimensional vectors without any extra spend.

For applications under 1M vectors with embedding dimensions of 1536 or below, pgvector performs well. Query latency typically sits in the 10-50ms range with a properly tuned HNSW index, which is fast enough for most user-facing applications. The ability to join vector search results with your relational data in a single SQL query is a genuine superpower. You can filter by user ID, date range, status, or any other column and combine it with semantic similarity in one query plan. No other vector database gives you that without extra integration work.

The trade-offs become apparent at scale. Beyond 5M vectors at 1536 dimensions, pgvector starts requiring serious hardware. Index build times climb to hours. Memory consumption for HNSW indexes is substantial, often requiring 2-4x the raw vector data in shared_buffers. Query latency degrades under concurrent load in ways that purpose-built vector databases handle more gracefully. We have seen p99 latencies jump from 30ms to 500ms+ when a pgvector instance is handling 50+ concurrent vector queries alongside transactional workloads. That resource contention is the real killer: your vector search and your application database are fighting for the same CPU, memory, and I/O.

Our recommendation: start with pgvector if you have under 1M vectors and your Postgres instance has headroom. It saves you real operational complexity. But plan your migration path before you hit the ceiling, not after.

Qdrant and Milvus: Alternatives Worth Knowing

Pinecone, Weaviate, and pgvector cover most use cases, but two other options deserve mention because they excel in specific scenarios.

Qdrant is a Rust-based, open-source vector database that has quietly become one of the fastest options available. It supports disk-based indexing through its "quantization" feature, which lets you store vectors on SSD rather than keeping everything in RAM. This dramatically reduces infrastructure costs for large collections. A 10M vector deployment on Qdrant can run on a machine with 8GB RAM where Weaviate would need 32GB+. Qdrant also has excellent payload filtering, allowing complex metadata filters to run alongside vector search without performance degradation. Self-hosted Qdrant on a c6g.xlarge instance runs about $100/month. Qdrant Cloud managed pricing starts around $25/month for small workloads.

Milvus is the heavy-hitter for massive scale. Originally developed by Zilliz, Milvus is designed for billion-scale vector workloads with a distributed architecture that separates compute, storage, and coordination. If you are building something that needs to handle 100M+ vectors with sub-50ms latency, Milvus is purpose-built for that. The trade-off is operational complexity. A production Milvus deployment requires etcd, MinIO (or S3), and multiple Milvus node types. The managed option, Zilliz Cloud, abstracts this away, with pricing starting around $65/month for a single compute unit. We generally recommend Milvus only when your scale genuinely demands it, because the infrastructure overhead is significant for smaller workloads.

Benchmarks: Latency, Recall, and Throughput Compared

Benchmarks are tricky because performance depends heavily on configuration, hardware, dataset size, and query patterns. That said, here are the numbers we have observed across production deployments and standardized tests using the ANN Benchmarks suite on a 1M vector dataset with 1536 dimensions and HNSW indexing.

Query latency (p50, single query, top-10 retrieval):

  • Pinecone serverless: 15-25ms (warm), 150-300ms (cold start)
  • Weaviate (self-hosted, 32GB RAM): 8-15ms
  • pgvector (RDS r6g.xlarge, HNSW): 12-30ms
  • Qdrant (self-hosted, 16GB RAM): 5-12ms
  • Milvus (single node): 10-20ms

Recall at top-10 (accuracy of nearest neighbor results):

  • Pinecone: 0.95-0.98 (not configurable, optimized internally)
  • Weaviate HNSW (ef=128): 0.96-0.99
  • pgvector HNSW (ef_search=100): 0.94-0.97
  • Qdrant HNSW (ef=128): 0.97-0.99

Throughput (queries per second, 10 concurrent clients):

  • Pinecone serverless: 200-500 QPS (scales automatically)
  • Weaviate: 800-1,500 QPS on a single node
  • pgvector: 300-600 QPS (heavily dependent on concurrent transactional load)
  • Qdrant: 1,000-2,000 QPS on a single node
Analytics dashboard displaying vector database performance benchmarks and latency metrics

The takeaway: for raw speed, Qdrant and Weaviate lead the pack on self-hosted deployments. Pinecone trades some performance for zero operational overhead. pgvector is competitive at moderate scale but falls behind under heavy concurrent load. These numbers shift significantly at 10M+ vectors, where purpose-built databases maintain performance and pgvector degrades more sharply.

Indexing Strategies: HNSW vs IVFFlat and When Each Matters

The indexing algorithm you choose has a bigger impact on performance than which database you pick. Two strategies dominate the vector database landscape: HNSW and IVFFlat.

HNSW (Hierarchical Navigable Small World) is the default choice for most production workloads and for good reason. It builds a multi-layered graph where each vector connects to its nearest neighbors across multiple hierarchy levels. Query time involves navigating from a random entry point through progressively more detailed layers until it reaches the nearest neighbors. The result: consistently fast queries with high recall, even at scale.

HNSW parameters that matter: M (number of connections per node, typically 16-64) controls the graph density. Higher M means better recall but more memory. efConstruction (typically 128-512) controls index build quality. Higher values produce a better graph but take longer to build. efSearch (typically 64-256) controls query-time accuracy. You can tune this per query to trade latency for recall. In practice, M=32 and efConstruction=256 is a solid starting point for most workloads.

IVFFlat (Inverted File with Flat Compression) takes a different approach. It clusters vectors into partitions (called Voronoi cells), then at query time only searches the closest partitions. It is faster to build than HNSW and uses less memory, but query performance is generally worse because accuracy depends heavily on how many partitions (nprobe) you search. Set nprobe too low and you miss relevant results. Set it too high and you lose the performance benefit of partitioning.

In pgvector, you have a clear choice between the two. Use HNSW for anything user-facing where recall matters. Use IVFFlat only if your dataset is large enough that HNSW index build times become impractical (generally 10M+ vectors on constrained hardware) and you can tolerate lower recall. Weaviate and Qdrant use HNSW exclusively. Pinecone uses a proprietary index that behaves similarly to HNSW but is not configurable.

One important detail for pgvector users: always build your HNSW index after loading your data, not before. Building an HNSW index incrementally (inserting vectors one at a time into an existing index) is significantly slower and produces a lower-quality graph than building it in batch on a pre-loaded table.

When to Use Each: A Decision Framework

After deploying these systems across startups, mid-market SaaS companies, and enterprise teams, here is the framework we use to recommend a vector database. The right answer almost always comes down to three factors: team ops capacity, data scale, and how tightly vector search needs to integrate with your existing stack.

Choose Pinecone when:

  • Your team has zero infrastructure engineers and you need vector search working this week
  • Your dataset is under 10M vectors and cost predictability matters more than raw performance
  • You are building a prototype or MVP and want the fastest path to production
  • You are comfortable with vendor lock-in as a trade-off for reduced operational burden

Choose Weaviate when:

  • You need hybrid search (keyword + semantic) as a core feature, not an afterthought
  • Your team can manage Kubernetes or you are willing to pay for Weaviate Cloud
  • Multi-tenancy is a requirement (SaaS products with per-customer data isolation)
  • You want the option to self-host for compliance, data residency, or cost control at scale

Choose pgvector when:

  • You already run Postgres and have under 1M vectors
  • Vector search needs to join with relational data in the same query
  • You want to avoid adding another service to your infrastructure
  • Your query concurrency is moderate (under 50 concurrent vector queries)

Choose Qdrant when: you need top-tier latency on a budget, your dataset is large (5M+), and you can self-host or use their cloud. Choose Milvus when: you are operating at 100M+ vectors and have the engineering team to manage a distributed system.

For teams building their first AI search system, we almost always start the conversation with pgvector or Pinecone, depending on whether they have existing Postgres infrastructure. Weaviate enters the picture when hybrid search or self-hosting is a hard requirement.

Migration Considerations and Avoiding Lock-In

One of the most overlooked aspects of choosing a vector database is the migration cost. Moving between vector databases is not like switching from MySQL to Postgres, where you can dump and restore schemas. Vector indexes cannot be transferred. You will re-embed your entire dataset or, at minimum, export raw vectors and re-index them in the new system. For a 10M document corpus, re-embedding with OpenAI text-embedding-3-small costs roughly $13 and takes 4-8 hours. The re-indexing step in the destination database adds another 2-6 hours depending on the system.

To minimize lock-in, we recommend three practices from day one. First, always store your raw text alongside your vectors, either in the vector database metadata or in a separate document store. Never rely on the vector database as your only copy of the data. Second, abstract your vector database behind an interface in your code. A simple class with methods like upsert(), query(), and delete() means swapping backends requires changing one implementation file, not every call site. Third, keep track of which embedding model and version you used. If you switch models during a migration, you will need to re-embed everything, and mixing embeddings from different models in the same index produces garbage results.

The migration path from pgvector to Weaviate or Qdrant is the smoothest because you can export vectors directly from Postgres with a simple SELECT query. Moving out of Pinecone requires using their fetch API to retrieve vectors in batches, which is rate-limited and slow for large indexes. This is another reason we push teams to keep a copy of their source data outside Pinecone.

If you are early in your AI journey and evaluating vector databases for a new project, we can help you avoid the common pitfalls and pick the right stack from the start. Book a free strategy call and we will walk through your architecture, data scale, and requirements to find the best fit.

Need help building this?

Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.

vector database comparisonPineconeWeaviatepgvectorRAG vector store

Ready to build your product?

Book a free 15-minute strategy call. No pitch, just clarity on your next steps.

Get Started