Traditional Search Is Not Broken, But It Is Holding Your SaaS Back
Let us be clear about something upfront: traditional keyword search still works. BM25, the ranking algorithm behind Elasticsearch and most full-text search engines, is a battle-tested algorithm that has served the web for decades. If your users search for exact product names, order IDs, or known terms, BM25 will find them. The problem is that user expectations have changed dramatically, and keyword search has not kept up.
In a SaaS product, search is a critical workflow, not a nice-to-have feature. Your users search for help articles, features, settings, teammates, records, and reports dozens of times per day. When they type "how to invite someone to my workspace" and your keyword engine returns nothing because your docs use the phrase "add team members," that is a failure. It is not a catastrophic failure. It is the kind of quiet friction that accumulates into support tickets, frustration, and eventually churn.
We have audited search implementations for over 30 SaaS products at this point. The pattern is remarkably consistent. Keyword search delivers a useful result about 55 to 65% of the time for natural language queries. The remaining 35 to 45% either returns irrelevant results or nothing at all. For exact-match lookups (IDs, names, codes), it performs at 95%+. The gap between those two numbers is exactly where AI search fits in.
The shift from keyword to AI search is not about throwing away what works. It is about layering intelligence on top of your existing infrastructure so you can handle the queries that BM25 was never designed for. If you are evaluating whether your SaaS product needs this upgrade, the honest litmus test is simple: pull your search analytics, look at your null-result rate and your low-click queries, and calculate how many users hit a dead end every week. If that number makes you uncomfortable, keep reading.
BM25 vs Semantic Search vs Hybrid: Understanding the Three Approaches
Before you can plan a migration, you need to understand what you are migrating to. There are three distinct search paradigms, and each one has real strengths and real weaknesses. The right answer for your product depends on your data, your users, and your budget.
BM25 Keyword Search
BM25 (Best Matching 25) is the default ranking algorithm in Elasticsearch, OpenSearch, and most traditional search engines. It scores documents based on term frequency, inverse document frequency, and document length. In plain terms, it rewards documents that contain your search terms frequently, penalizes common words that appear everywhere, and adjusts for document length so short documents do not get unfairly penalized.
Strengths: BM25 is fast, predictable, well understood, and excellent at exact-match retrieval. It handles structured filters (date ranges, categories, status fields) natively. It is also deterministic, meaning the same query always produces the same results, which makes debugging straightforward. For SaaS products with technical users who search using precise terms, BM25 is perfectly adequate.
Weaknesses: BM25 has zero understanding of meaning. It cannot bridge vocabulary gaps (users say "remove" but your docs say "delete"), cannot handle natural language questions, and struggles with typos beyond basic fuzzy matching. Relevance tuning requires manual boosting rules, synonym lists, and custom analyzers that become difficult to maintain as your content grows.
Semantic Vector Search
Semantic search uses embedding models to convert text into high-dimensional vectors, then finds results by measuring the mathematical similarity between the query vector and document vectors. Instead of matching keywords, it matches meaning. "How do I remove a team member" and "delete user from workspace" produce similar vectors because the embedding model understands they describe the same intent.
Strengths: Semantic search handles synonyms, paraphrases, and natural language queries without any manual configuration. There are no synonym lists to maintain, no boost rules to tweak. It dramatically improves recall for conversational queries, which is the fastest-growing query type in SaaS products thanks to users trained by ChatGPT and Google.
Weaknesses: Pure vector search struggles with exact-match queries. If a user searches for ticket ID "TKT-29471," semantic search might return tickets about similar topics instead of that exact ticket. It also has limited transparency. Explaining why a specific result ranked higher than another is much harder with vector similarity than with term-frequency scores. Embedding quality depends entirely on the model you choose, and no model is perfect across all domains.
Hybrid Search
Hybrid search runs both BM25 keyword matching and vector similarity search in parallel, then merges the results using a fusion algorithm like Reciprocal Rank Fusion (RRF). You get the exact-match precision of keywords and the semantic understanding of vectors in a single query. For a deeper technical breakdown of how to build this, check out our complete AI search architecture guide.
Our recommendation: Hybrid search is the right default for any SaaS product migrating from traditional search. It preserves everything that already works with BM25 while adding the semantic layer your users increasingly expect. The latency overhead of running two retrieval paths in parallel is typically 5 to 15ms, which is imperceptible to users. The cost overhead is real but manageable, as we will cover in detail later.
Embedding Models and Reranking: The Engine Behind AI Search Quality
Choosing your embedding model is the single most impactful decision in your AI search migration. The model determines how well your system understands what users mean when they type a query. A strong embedding model with a simple vector database will outperform a weak model with the most sophisticated infrastructure money can buy.
For most SaaS products, start with OpenAI text-embedding-3-large. It produces 3,072-dimensional vectors, ranks consistently in the top tier on MTEB retrieval benchmarks, and costs $0.13 per million tokens. For a SaaS product with 100,000 documents averaging 200 tokens each, your initial indexing cost is about $2.60. Ongoing query embedding costs at 10,000 searches per day (averaging 10 tokens per query) run roughly $0.04 per day. These are negligible costs relative to the search quality improvement.
If you need multilingual support, Cohere embed-v4 is the stronger choice. It handles 100+ languages natively and outperforms OpenAI on cross-lingual retrieval tasks. Pricing is $0.10 per million tokens, slightly cheaper than OpenAI. For SaaS products serving international markets, this is the model to benchmark first.
Self-hosting is viable if you process more than 20 million tokens per month. Open-source models like BGE-M3 and Nomic Embed v2 run on a single NVIDIA A10G GPU (roughly $1.00/hour on-demand on AWS), handling 300 to 500 requests per second. At that volume, self-hosting costs about 40% of what you would pay OpenAI for the same throughput.
Why Reranking Matters More Than You Think
Reranking is the second stage of an AI search pipeline that most teams skip and then regret skipping. After your initial retrieval (BM25 + vector) returns the top 50 to 100 candidates, a reranking model re-scores those candidates using a more computationally expensive cross-encoder that considers the full interaction between the query and each document. This is far more accurate than the bi-encoder used for initial retrieval because it processes the query and document together rather than independently.
Cohere Rerank 3.5 is the production standard. It costs $2 per 1,000 rerank requests, each handling up to 100 documents. For a SaaS product processing 10,000 searches per day, that is $20/day or roughly $600/month. The relevance improvement is significant: in our benchmarks, adding reranking to hybrid retrieval improves top-3 precision by 8 to 15 percentage points. For search-heavy SaaS products where users rely on search to get their work done, that improvement directly translates to reduced support burden and higher retention.
Jina Reranker v2 is a solid open-source alternative if you want to self-host. It runs on modest GPU hardware and achieves about 85 to 90% of Cohere Rerank quality on English text. For budget-constrained startups, this is a reasonable starting point that you can upgrade later.
Vendor Showdown: Algolia vs Typesense vs Vector Database Approaches
Your choice of search infrastructure will shape your migration timeline, ongoing costs, and operational complexity. Here is an honest comparison of the most common approaches for SaaS products migrating to AI search.
Algolia: The Premium Managed Path
Algolia added AI search features through their NeuralSearch product, which layers semantic understanding on top of their keyword engine. The advantage is that if you are already on Algolia, the migration path is the simplest of any option: enable NeuralSearch on your existing index, and Algolia handles embedding, vector storage, and hybrid retrieval for you. The downside is cost. Algolia pricing starts at roughly $1 per 1,000 search requests, and NeuralSearch adds a premium on top. A SaaS product doing 500,000 searches per month is looking at $800 to $1,500/month for AI-enhanced search. For a detailed cost and feature comparison, see our Algolia vs Meilisearch vs Typesense breakdown.
Typesense: The Performance-First Open-Source Option
Typesense added vector search support in version 0.25, and it has matured significantly since then. You can store embedding vectors alongside your regular document fields, enabling true hybrid search within a single engine. Query latency stays in the 5 to 15ms range even with vector search enabled. Self-hosted Typesense on a $60 to $100/month VPS handles millions of documents comfortably. Typesense Cloud with high availability runs $150 to $400/month depending on your data size. The tradeoff: Typesense supports a limited set of embedding dimensions and similarity metrics compared to purpose-built vector databases. For most SaaS search use cases, this is not a limitation. But if you need advanced vector operations like filtered vector search with complex metadata predicates, you may outgrow it.
Purpose-Built Vector Databases (Pinecone, Weaviate, Qdrant)
If your search requirements go beyond standard document search into territory like multi-modal search, recommendation systems, or RAG-powered question answering, a dedicated vector database gives you the most flexibility. Pinecone is the fully managed option starting at $70/month for their Standard plan. Weaviate and Qdrant can be self-hosted for free. The tradeoff is that you are now managing two search systems: your existing keyword search (Elasticsearch, Postgres full-text, etc.) and a vector database. This increases architectural complexity, deployment overhead, and the surface area for bugs.
Our recommendation by company stage:
- Early-stage SaaS (under 50K users): Typesense self-hosted with built-in vector search. Total cost under $150/month. Simple to operate, fast, and handles hybrid search natively.
- Growth-stage SaaS (50K to 500K users): Typesense Cloud or Algolia NeuralSearch depending on whether you prioritize cost savings or operational simplicity. Budget $300 to $1,000/month.
- Enterprise SaaS (500K+ users): Dedicated vector database (Weaviate or Qdrant self-hosted) alongside Elasticsearch/OpenSearch for keyword search. Custom reranking pipeline. Budget $2,000 to $8,000/month for infrastructure, plus engineering time.
Latency and Cost Tradeoffs: What You Actually Pay for AI Search
Every layer of intelligence you add to search has a latency cost and a dollar cost. Understanding these tradeoffs lets you make informed decisions about which layers to implement immediately and which to defer. Here is the real breakdown for a SaaS product handling 10,000 searches per day.
Latency Budget
Your users expect search results in under 200ms. Ideally under 100ms. Here is how the latency stacks up for each approach:
- BM25 keyword search only: 5 to 30ms total. This is your baseline.
- Vector search only: 10 to 40ms for the vector query, plus 20 to 80ms for the embedding API call to convert the query to a vector. Total: 30 to 120ms.
- Hybrid search (BM25 + vector in parallel): 30 to 120ms total, since both paths run concurrently and you wait for the slower one.
- Hybrid search + reranking: 80 to 200ms total. Reranking adds 50 to 100ms depending on the number of candidates and the model.
The embedding API call is the latency bottleneck for most implementations. You can reduce it by self-hosting your embedding model (cuts latency from 20 to 80ms down to 5 to 15ms) or by caching embeddings for frequent queries. In our experience, the top 20% of queries in a SaaS product account for 80% of search volume, so a simple LRU cache on query embeddings eliminates the API call for most requests.
Monthly Cost Breakdown
For a SaaS product with 100,000 searchable documents and 10,000 daily searches:
- Embedding API (OpenAI text-embedding-3-large): $1.20/month for query embeddings, $2.60 one-time for document indexing. Negligible.
- Vector database (Typesense self-hosted): $60 to $100/month for a VPS with enough RAM for 100K vectors.
- Reranking (Cohere Rerank 3.5): $600/month at 10,000 requests/day, $20/day.
- Total for hybrid search without reranking: $65 to $105/month.
- Total for hybrid search with reranking: $665 to $705/month.
Compare this to the cost of bad search. If your null-result rate drops from 35% to under 5% and that prevents even 10 churned users per month at an average contract value of $100/month, the AI search upgrade pays for itself immediately. For a more detailed cost analysis including enterprise-scale scenarios, see our complete AI search cost breakdown.
Where to save money: Skip reranking initially. Ship hybrid search (BM25 + vector) first, measure your relevance metrics, and add reranking only if your top-3 precision is below 85%. For many SaaS products with well-structured content, hybrid search alone gets you to 88 to 92% relevance without the $600/month reranking cost.
Step-by-Step Migration Playbook: From Keyword Search to AI Search
This is the playbook we follow when migrating SaaS products from traditional search to AI-powered search. It is designed to minimize risk, avoid downtime, and let you validate improvements with real user data before committing fully.
Phase 1: Instrument and Baseline (Week 1 to 2)
Before you change anything, you need data. Add search analytics to your existing implementation if you do not already have them. Track these metrics for every search query: the raw query text, the number of results returned, which results users click (and their position), whether users refine their query after seeing results, and time to first click. Calculate your baseline null-result rate, mean reciprocal rank (MRR), and click-through rate at positions 1, 3, and 5. These numbers are your "before" snapshot. Every decision from here forward should be measured against them.
Phase 2: Set Up the Vector Pipeline (Week 3 to 4)
Choose your embedding model (start with OpenAI text-embedding-3-large unless you have a specific reason not to) and set up a background job to embed all your searchable documents. Store the vectors in your chosen vector database alongside the document IDs. Build a simple query endpoint that embeds the user query, runs a vector similarity search, and returns ranked results. Do not expose this to users yet. Instead, run it in shadow mode: for every production search query, run both your existing BM25 search and the new vector search, log both result sets, and compare offline. This shadow comparison will reveal where vector search excels and where it falls short compared to your current system.
Phase 3: Build Hybrid Retrieval (Week 5 to 6)
Implement Reciprocal Rank Fusion (RRF) to merge results from your BM25 engine and your vector search. The standard formula is: for each document, calculate score = sum of 1/(k + rank) across both retrieval paths, where k = 60. Start with equal weights for BM25 and vector search (0.5/0.5). Run the hybrid system in shadow mode alongside your production search for at least one week. Compare the hybrid results against both BM25-only and vector-only results using your click data. In our experience, the hybrid system outperforms both individual approaches within the first week of shadow testing.
Phase 4: A/B Test in Production (Week 7 to 8)
Roll out the hybrid search to 10% of your users using a feature flag. Track the same metrics from Phase 1 for both the control group (old search) and the test group (hybrid search). Pay close attention to null-result rate, click-through rate, and search refinement rate (lower is better, because it means users found what they needed on the first try). If the test group shows improvement, ramp to 25%, then 50%, then 100% over the following two weeks. If metrics are flat or negative for specific query types, adjust your RRF weights before expanding.
Phase 5: Add Reranking (Optional, Week 9 to 10)
If your hybrid search achieves top-3 precision above 85%, you may not need reranking. If it is below that threshold, add a reranking stage using Cohere Rerank 3.5 or a self-hosted Jina Reranker. Rerank the top 50 candidates from your hybrid retrieval. A/B test the reranked results against the non-reranked hybrid results. Expect a 8 to 15 percentage point improvement in top-3 precision, at the cost of 50 to 100ms additional latency and roughly $600/month at 10,000 daily searches.
Phase 6: Optimize and Iterate (Ongoing)
Search is never done. Set up weekly reporting on your core metrics: null-result rate, MRR, click-through rate, and search-driven task completion rate. Review the lowest-performing queries each week and use them to guide your next improvements. Common optimizations include fine-tuning RRF weights for different query categories, adding query classification to route exact-match queries to BM25 only (skipping the vector path for speed), implementing query embedding caching for popular searches, and periodically re-indexing your documents as content changes. The total migration timeline from kickoff to full rollout is 8 to 10 weeks for a team with one backend engineer dedicated to the project. If you want to accelerate this or avoid the common pitfalls we see teams hit during migration, that is exactly the kind of project we take on.
Common Migration Pitfalls and How to Avoid Them
We have shipped enough of these migrations to know where teams get stuck. Here are the five most common pitfalls and how to sidestep them.
Pitfall 1: Going straight to vector search without preserving keyword matching. Pure vector search looks amazing in demos but fails on exact-match queries in production. Users who search for "INV-2024-0847" need that exact invoice, not semantically similar invoices. Always implement hybrid search, not vector-only. This is the single most common mistake we see.
Pitfall 2: Choosing an embedding model based on benchmarks alone. MTEB scores are useful directional indicators, but your domain-specific data may behave differently. A model that ranks first on general retrieval benchmarks might rank fourth on your specific document corpus. Always run a domain-specific evaluation on 200 to 500 real queries from your product before committing to a model.
Pitfall 3: Ignoring the re-indexing pipeline. Your SaaS content changes constantly. New help articles, updated product descriptions, modified settings, new user-generated content. If you embed your documents once and never update the vectors, your search quality degrades within weeks. Build an incremental indexing pipeline that re-embeds documents on create and update. Use a queue (SQS, Redis, or even a simple database table) to decouple embedding from your write path so you do not add latency to every content update.
Pitfall 4: Over-engineering the first version. You do not need reranking, query expansion, personalized embeddings, and a custom fine-tuned model on day one. Ship hybrid search with a commercial embedding model and off-the-shelf vector storage first. Measure results. Then add complexity only where the data shows you need it. Teams that try to build the ultimate search pipeline from scratch typically spend 4 to 6 months and end up with something more fragile than a simpler system shipped in 8 weeks.
Pitfall 5: Not tracking the right metrics. Page-level analytics like bounce rate are not granular enough to evaluate search quality. You need query-level metrics: null-result rate, MRR, precision at positions 1, 3, and 5, and search session success rate (did the user accomplish their task after searching?). Without these, you are flying blind and cannot make data-driven decisions about relevance tuning. Set up the instrumentation in Phase 1 and do not skip it.
If you are planning a migration from traditional search to AI-powered search for your SaaS product and want to avoid these pitfalls, we can help. Our team has shipped hybrid search implementations for SaaS companies across industries, from 10K-user startups to products serving millions. Book a free strategy call and we will walk through your search architecture, identify the biggest quick wins, and scope out a migration plan tailored to your product.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.