---
title: "Postgres Extensions Guide: pgvector, TimescaleDB, PostGIS 2026"
author: "Nate Laquis"
author_role: "Founder & CEO"
date: "2027-01-15"
category: "Technology"
tags:
  - Postgres extensions
  - pgvector
  - TimescaleDB
  - PostGIS
  - PostgreSQL performance
excerpt: "Postgres extensions turn a solid relational database into a vector search engine, a time-series powerhouse, or a geospatial platform. Here is how to pick the right ones and avoid the mess of running three separate databases."
reading_time: "14 min read"
canonical_url: "https://kanopylabs.com/blog/postgres-extensions-guide-pgvector-timescaledb-postgis"
---

# Postgres Extensions Guide: pgvector, TimescaleDB, PostGIS 2026

## Why Postgres Extensions Beat Standalone Databases

Every few years a new category of database appears and claims you need a dedicated system for vector search, time-series analytics, or geospatial queries. The pitch is compelling: purpose-built tools outperform general-purpose ones. The reality is more nuanced. Running three specialized databases alongside Postgres means three sets of credentials, three backup strategies, three monitoring dashboards, and three points of failure. For most teams under 50 engineers, that operational overhead eats whatever performance advantage you gained.

Postgres extensions flip the equation. Instead of shipping your data to a specialized engine, you bring the specialized engine to your data. pgvector adds vector similarity search directly inside Postgres. TimescaleDB turns your existing instance into a time-series database that rivals InfluxDB. PostGIS makes Postgres one of the most capable geospatial systems on the planet, used by organizations from Uber to the US Census Bureau.

The trade-off is real. A dedicated [vector database like Pinecone or Weaviate](/blog/pinecone-vs-weaviate-vs-qdrant-vector-databases) will outperform pgvector at 100 million vectors. InfluxDB handles petabyte-scale time-series ingestion better than TimescaleDB on a single node. But if your dataset is under 10 million vectors, under 1 billion time-series rows, or under 100 million geospatial records, Postgres extensions handle it comfortably while keeping your architecture simple.

![Server infrastructure representing PostgreSQL database with extensions for vector search and geospatial data](https://images.unsplash.com/photo-1558494949-ef010cbdcc31?w=800&q=80)

We have built production systems on all three extensions. This guide covers what each one does, how to set it up, performance you should expect, cost implications, and when a standalone database makes more sense.

## pgvector: Vector Search Inside Postgres

pgvector is the extension that changed how teams think about vector databases. Released in 2021 and now at version 0.8.x, it stores vector embeddings as a native column type and supports similarity search using cosine distance, inner product, L2 (Euclidean) distance, and L1 (Manhattan) distance. You can run a semantic search query with a single SQL statement instead of calling an external API.

### Setup and Basic Usage

On managed Postgres providers (Supabase, Neon, Render, AWS RDS), enabling pgvector is one command: **CREATE EXTENSION vector;**. Self-hosted installations need the extension package installed first, but every major Linux distribution has packages available. Then create a vector column: **ALTER TABLE products ADD COLUMN embedding vector(1536);** for OpenAI's text-embedding-3-small output.

Querying is straightforward SQL. A nearest-neighbor search looks like: **SELECT id, title FROM products ORDER BY embedding  '[0.1, 0.2, ...]' LIMIT 10;**. The **** operator computes cosine distance. You can combine this with regular WHERE clauses, joins, and aggregations. That composability is pgvector's killer feature. Try doing a vector similarity search filtered by user permissions, joined against an orders table, grouped by category in Pinecone. You cannot.

### Indexing Strategies

Without an index, pgvector performs exact nearest-neighbor search by scanning every row. That works fine up to about 100,000 vectors. Beyond that, you need an approximate nearest-neighbor (ANN) index. pgvector supports two types:

- **IVFFlat:** Divides vectors into lists (clusters) and searches only the closest lists. Faster to build, uses less memory, but lower recall at high speed. Good for datasets under 5 million vectors. Create with: **CREATE INDEX ON products USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);**

- **HNSW:** Builds a hierarchical navigable small world graph. Slower to build (2-5x longer), uses more memory, but significantly better recall/speed trade-off. This is the default recommendation for most workloads. Create with: **CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);**

### Performance Benchmarks and Limits

On a db.r6g.xlarge RDS instance (4 vCPUs, 32GB RAM) running pgvector 0.8 with HNSW indexing, we measured the following on a real client project with 1536-dimension embeddings:

- **1 million vectors:** ~5ms p50 query latency, 98.5% recall at ef_search=40

- **5 million vectors:** ~12ms p50 query latency, 97% recall at ef_search=64

- **10 million vectors:** ~25ms p50 query latency, 96% recall at ef_search=100

Beyond 10 million vectors, you start hitting memory pressure. HNSW indexes need to fit in RAM for good performance, and a 10M-vector HNSW index with 1536 dimensions uses roughly 25-30GB of RAM. At that point, you either need a larger instance ($800+/month on RDS) or should consider a dedicated vector database.

### When to Use a Dedicated Vector Database Instead

Choose Pinecone, Weaviate, or Qdrant when: you exceed 20 million high-dimensional vectors, need sub-5ms latency at scale, require multi-tenancy with thousands of indexes, or your vector workload needs to scale independently from transactional queries. For RAG applications with under 5 million documents, pgvector is the pragmatic choice.

## TimescaleDB: Time-Series Analytics on Postgres

TimescaleDB transforms Postgres into a time-series database by introducing hypertables, which automatically partition data by time intervals. You write standard SQL against what looks like a normal table, but TimescaleDB handles chunking, compression, retention policies, and query optimization for time-ordered data. Timescale Inc. offers both an open-source Apache 2 edition and a paid Timescale Cloud service.

### Core Concepts

A hypertable is a virtual table that automatically creates time-based partitions (chunks). When you insert a row with timestamp 2027-01-15 10:30:00, TimescaleDB routes it to the correct chunk without you thinking about partitioning. Queries that filter by time range only scan relevant chunks, which makes analytics queries over specific windows extremely fast.

Continuous aggregates are materialized views that refresh automatically as new data arrives. Define hourly averages once, and TimescaleDB keeps them updated incrementally. No cron jobs, no manual refresh scripts. This alone saves hours of engineering time on analytics-heavy products.

Native compression reduces storage by 90-95% for time-series data using delta encoding, gorilla compression, and dictionary compression tuned for time-series patterns. A month of IoT sensor data at 50GB uncompressed fits in 3-5GB compressed, and compressed chunks are still queryable with standard SQL.

### Setup and Configuration

Install the extension, then convert any table: **SELECT create_hypertable('sensor_readings', by_range('recorded_at'));**. Set chunk intervals based on query patterns. For IoT data queried by day, use 1-day chunks. For financial data queried by hour, use 1-hour chunks. The default is 7 days, reasonable for most use cases.

Add a compression policy: **SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');**. Add a retention policy: **SELECT add_retention_policy('sensor_readings', INTERVAL '1 year');**. These run as background jobs inside Postgres, no external scheduler needed.

### Performance Characteristics

On a 4-vCPU, 16GB instance ingesting IoT sensor data:

- **Insert throughput:** 200,000-400,000 rows/second with batched inserts (COPY command)

- **Time-range queries:** Sub-100ms for aggregations over 1 million rows within a time window

- **Compression ratio:** 10-20x typical for numeric sensor data

- **Storage cost:** A year of data from 10,000 sensors reporting every 10 seconds (31.5 billion rows) fits in approximately 300GB compressed

### TimescaleDB vs. InfluxDB vs. ClickHouse

InfluxDB is faster for pure time-series ingestion at extreme scale (millions of metrics per second). ClickHouse is faster for analytical queries over billions of rows. But both require separate infrastructure and query languages. TimescaleDB wins when your team already knows Postgres, you need to join time-series data with relational data, or you want one database to manage. For SaaS products tracking user analytics, application metrics, or IoT data under 1 billion rows, TimescaleDB is the right call.

![Data analytics dashboard showing time-series metrics and performance graphs](https://images.unsplash.com/photo-1551288049-bebda4e38f71?w=800&q=80)

## PostGIS: Geospatial Queries at Scale

PostGIS has been the gold standard for geospatial data in relational databases for over two decades. It adds geometry and geography types, spatial indexing (GiST and SP-GiST), and hundreds of functions for distance calculations, polygon intersections, routing, and coordinate transformations. If you have ever used Google Maps, Uber, or Airbnb, the backend probably touches PostGIS somewhere.

### Geometry vs. Geography Types

PostGIS offers two spatial types, and picking the wrong one causes headaches. **Geometry** operates on a flat Cartesian plane and is faster, but distances are in the coordinate system's units (degrees for lat/long, which are meaningless for human-readable distances). **Geography** operates on a spherical earth model and returns distances in meters, but supports fewer functions and is slower. Use geography for global applications needing accurate distances (delivery radius, store locator). Use geometry with a projected coordinate system (like UTM) for regional applications needing complex spatial operations (land parcel analysis, urban planning).

### Indexing for Performance

Spatial queries without indexes are painfully slow. PostGIS uses GiST (Generalized Search Tree) indexes for bounding box lookups: **CREATE INDEX ON locations USING gist (geom);**. For point-only data, SP-GiST indexes can be faster. Always ANALYZE after bulk inserts, or the query planner will ignore the spatial index.

A common "find nearby" pattern: **SELECT name, ST_Distance(geom, ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326)) AS distance FROM locations WHERE ST_DWithin(geom, ST_SetSRID(ST_MakePoint(-73.985, 40.748), 4326), 5000) ORDER BY distance LIMIT 20;**. This finds the 20 closest locations within 5km of a point in Manhattan. With a GiST index on 10 million locations, it returns in under 10ms.

### Common Use Cases We Build

- **Store/restaurant locators:** "Find the 10 nearest locations within 25 miles." PostGIS handles this trivially with ST_DWithin and KNN ordering.

- **Delivery zone management:** Define service areas as polygons, check if a delivery address falls within a zone using ST_Contains. Combine with time-based pricing using a simple join against a rates table.

- **Fleet tracking:** Store vehicle positions as time-stamped points, calculate distances traveled with ST_Length on linestrings, detect geofence entries/exits with ST_Intersects.

- **Real estate search:** Filter properties by polygon (draw-on-map search), calculate distances to amenities, aggregate statistics by neighborhood boundaries.

### PostGIS vs. MongoDB Geospatial vs. Elasticsearch

MongoDB's geospatial queries cover basic use cases (nearby search, polygon containment) but lack the depth of PostGIS: no topology, raster data, 3D geometry, or complex spatial joins. Elasticsearch has geospatial filters but is not a geospatial database. If you just need "sort by distance" on search results, MongoDB or Elasticsearch might be simpler. For anything more complex, PostGIS saves you from reinventing spatial algorithms.

## Running Multiple Extensions Together

The real power of Postgres extensions emerges when you combine them. A logistics platform might use PostGIS for route planning, TimescaleDB for tracking delivery performance, and pgvector for semantic search over product catalogs. All three coexist in the same instance, querying the same tables, sharing the same transaction guarantees.

### Memory and Resource Planning

Each extension adds memory overhead. pgvector's HNSW indexes are memory-hungry. TimescaleDB's background workers consume CPU and I/O. PostGIS spatial indexes add to shared_buffers pressure. Our recommended instance sizes for running all three:

- **Development/staging:** 2 vCPUs, 8GB RAM. Enough for small datasets across all three extensions. Cost: ~$50/month on RDS, ~$30/month on Supabase Pro.

- **Production (moderate load):** 4 vCPUs, 32GB RAM. Handles 5M vectors, 500M time-series rows, and 10M spatial records comfortably. Cost: ~$200/month on RDS, ~$150/month on Neon Scale.

- **Production (heavy load):** 8 vCPUs, 64GB RAM. Required when HNSW indexes exceed 20GB or TimescaleDB compression jobs compete with query traffic. Cost: ~$500/month on RDS.

### Configuration Tuning

Key postgresql.conf settings when running multiple extensions:

- **shared_buffers:** Set to 25% of total RAM. With 32GB RAM, use 8GB. This is shared across all extensions' indexes.

- **effective_cache_size:** Set to 75% of total RAM (24GB on a 32GB instance). Helps the query planner make better decisions about index usage.

- **work_mem:** Increase to 256MB-512MB for complex spatial joins or vector similarity searches. Be careful, this is per-operation, so 20 concurrent queries at 512MB = 10GB.

- **maintenance_work_mem:** Set to 2-4GB for faster index creation (HNSW builds, GiST rebuilds, TimescaleDB compression).

- **max_worker_processes:** TimescaleDB needs background workers. Set to at least the default (8) plus the number of TimescaleDB background jobs you run.

### Managed Postgres Providers and Extension Support

Not every provider supports all three extensions. Here is the current state for managed [Postgres providers](/blog/neon-vs-planetscale-vs-supabase):

- **Supabase:** pgvector (yes, built-in), TimescaleDB (no), PostGIS (yes). Best for pgvector + PostGIS workloads.

- **Neon:** pgvector (yes), TimescaleDB (no), PostGIS (yes). Serverless scaling is great for bursty vector search.

- **Timescale Cloud:** pgvector (yes), TimescaleDB (yes, obviously), PostGIS (yes). The only managed option that supports all three natively.

- **AWS RDS:** pgvector (yes, since late 2023), TimescaleDB (no, use the Timescale AMI on EC2), PostGIS (yes). Best for large-scale production with full AWS integration.

- **Google Cloud SQL:** pgvector (yes), TimescaleDB (no), PostGIS (yes).

If you need all three extensions in a single managed instance, Timescale Cloud is currently the only turnkey option. Otherwise, self-host on a VM (Hetzner at $45/month for 8 vCPUs/32GB is excellent value) or use AWS EC2 with the Timescale AMI.

## Migration Strategies and Common Pitfalls

Adding extensions to an existing Postgres database is straightforward. Migrating from a standalone database into Postgres extensions requires more planning.

### Migrating from Pinecone/Weaviate to pgvector

Export your vectors (most vector databases support bulk export to JSON or Parquet). Create your pgvector table with matching dimensionality. Use COPY to bulk-load vectors (10-50x faster than INSERT). Build your HNSW index after loading all data, not before. Index creation on 5 million vectors takes 15-30 minutes depending on hardware, so schedule during a maintenance window.

The biggest pitfall: assuming pgvector filtering works like your vector database's API. Pinecone applies metadata filters before ANN search (pre-filtering). pgvector applies WHERE clauses as post-filtering by default. For filtered queries, create a partial HNSW index on the filtered subset or combine GIN indexes on metadata columns with the vector index.

### Migrating from InfluxDB to TimescaleDB

InfluxDB's line protocol does not map directly to SQL tables. The general pattern: each measurement becomes a hypertable, tags become indexed columns, fields become regular columns. Export from InfluxDB in CSV format, transform with a script, and COPY into TimescaleDB. Watch out for schema width. InfluxDB encourages wide tables, but TimescaleDB performs better with narrower tables. If your measurement has 200 fields, split it into multiple hypertables joined by device_id and timestamp.

### Migrating from MongoDB Geospatial to PostGIS

MongoDB stores geospatial data as GeoJSON, and PostGIS speaks GeoJSON natively. Export your collection, then use ST_GeomFromGeoJSON to import. The tricky part: MongoDB assumes WGS84 (SRID 4326). Make sure your PostGIS data uses the same SRID, or distance calculations will be wrong.

![Code editor displaying PostgreSQL database migration queries and configuration](https://images.unsplash.com/photo-1555949963-ff9fe0c870eb?w=800&q=80)

### Pitfalls That Burn Teams

- **Not testing index build times:** HNSW index creation locks the table. On 10M vectors, that is 30-60 minutes of downtime unless you use CREATE INDEX CONCURRENTLY (which takes 2-3x longer but does not lock).

- **Ignoring VACUUM:** Heavy insert/update workloads on hypertables and vector tables generate dead tuples fast. Tune autovacuum aggressively: autovacuum_vacuum_scale_factor = 0.01 and autovacuum_analyze_scale_factor = 0.005.

- **Undersizing shared_buffers:** If your HNSW index is 20GB and shared_buffers is 4GB, every vector query hits disk. This turns 5ms queries into 50ms queries.

- **Mixing OLTP and OLAP on the same instance:** A heavy TimescaleDB continuous aggregate refresh can spike CPU and slow your transactional pgvector queries. Use read replicas or connection pooling (PgBouncer) to isolate workloads.

## Other Postgres Extensions Worth Knowing

pgvector, TimescaleDB, and PostGIS get the most attention, but Postgres has a rich extension ecosystem. Here are the ones we install on almost every production database.

### pg_stat_statements

Tracks execution statistics for all SQL queries. Essential for identifying slow queries, high-frequency queries, and queries that scan too many rows. Enable it in every environment. It is the single most useful extension for performance debugging, and it is included with Postgres by default.

### pg_trgm (Trigram Matching)

Adds trigram-based text similarity functions and GIN/GiST index support for LIKE and ILIKE queries. Without pg_trgm, a query like **WHERE name ILIKE '%pizza%'** performs a full table scan. With a GIN trigram index, it uses the index. If you have any user-facing text search that does not warrant full-text search or pgvector semantic search, pg_trgm is the answer.

### pgcrypto

Provides cryptographic functions including password hashing (crypt + gen_salt with bcrypt) and column-level encryption. Useful for encrypting PII at rest without application-level encryption complexity.

### pg_partman

Automated table partitioning management. If you need partitioned tables for large audit logs or event tables but do not need TimescaleDB's full feature set, pg_partman handles partition creation, maintenance, and retention automatically.

### Citus (Distributed Postgres)

Shards tables across multiple Postgres nodes for horizontal scaling. Now owned by Microsoft and available on Azure Cosmos DB for PostgreSQL. Consider Citus above 5TB of data or 100K queries/second. Most applications never reach that threshold.

### pg_cron

Runs scheduled jobs inside Postgres. No external cron daemon needed. Syntax: **SELECT cron.schedule('0 3 * * *', 'CALL cleanup_expired_sessions()');**

## Choosing the Right Architecture for Your Project

After building dozens of systems with these extensions, here is our decision framework.

### Single Extension, Moderate Scale

If you need one extension (just vector search, just time-series, or just geospatial) and your dataset fits comfortably in a single instance (under 500GB), use a managed Postgres provider that supports your extension. Supabase for pgvector, Timescale Cloud for TimescaleDB, any major provider for PostGIS. Cost: $25-$200/month. This is the sweet spot for most startups and small-to-medium SaaS products.

### Multiple Extensions, Moderate Scale

If you need two or three extensions with moderate data volumes (under 1TB total), run them on a single Postgres instance. Timescale Cloud supports all three. Self-hosted on Hetzner or AWS EC2 gives you full control. Cost: $100-$500/month. This architecture handles surprisingly large workloads. A well-tuned 8-vCPU/64GB instance can serve 5 million vectors, 1 billion time-series rows, and 50 million spatial records simultaneously.

### High Scale or Isolated Workloads

When one workload dominates and needs independent scaling, split it out. Keep your relational data and smaller extension workloads in Postgres, and move the heavy workload to a dedicated system. Common patterns:

- **Heavy vector search (50M+ vectors):** Move to Qdrant or Weaviate. Keep relational data and geospatial queries in Postgres.

- **Heavy time-series (10B+ rows, 1M+ inserts/second):** Move to ClickHouse for analytics, keep TimescaleDB for real-time dashboards and recent data.

- **Heavy geospatial (complex topology, nationwide routing):** PostGIS is usually still the answer, but on a dedicated, larger instance with tuned parameters.

### Our Recommendation for Most Teams

Start with a single Postgres instance and add extensions as you need them. Do not pre-optimize by spinning up Pinecone, InfluxDB, and a separate PostGIS instance before you have proven your product needs that scale. The [choice between Postgres and other databases](/blog/postgresql-vs-mongodb) often comes down to operational simplicity, and extensions make Postgres the most operationally simple path for multi-workload applications.

Postgres extensions are one of the strongest arguments for choosing Postgres as your primary database. You get a single system that handles relational data, vector embeddings, time-series analytics, and geospatial queries. One query language, one backup system, one set of metrics. That simplicity compounds over years of development.

If you are evaluating database architecture for a new product or considering migrating specialized workloads back into Postgres, we can help. Our team has shipped pgvector at scale, TimescaleDB for IoT platforms, and PostGIS for logistics applications. [Book a free strategy call](/get-started) and we will map out the right architecture for your requirements.

---

*Originally published on [Kanopy Labs](https://kanopylabs.com/blog/postgres-extensions-guide-pgvector-timescaledb-postgis)*
