How to Build·14 min read

How to Build an AI-Powered Browser Extension From Scratch 2026

Browser extensions sit closer to the user than any other software. Adding AI to that surface turns a simple utility into something that reads, summarizes, writes, and acts on the page the user is already looking at.

Nate Laquis

Nate Laquis

Founder & CEO

Why Browser Extensions Are the Perfect Surface for AI

Browser extensions have a superpower that standalone web apps do not: they live inside the page the user is already viewing. That means your AI has direct access to the content, context, and workflow the user cares about right now. No copy-pasting URLs into a separate tool. No switching tabs. The AI just works where you work.

This is why we have seen an explosion of AI-powered extensions in 2025 and 2026. Tools like Sider, Monica, Merlin, and MaxAI collectively have tens of millions of users. They prove the market is real and the UX pattern works. But most of them are thin wrappers around ChatGPT with a popup window. You can build something far more useful by going deeper into the browser platform.

The opportunity is especially strong for vertical use cases. A general-purpose "AI assistant in your browser" competes with a dozen established players. An AI extension built specifically for recruiters scanning LinkedIn profiles, or for procurement teams comparing vendor spec sheets, or for researchers annotating academic papers? That is a defensible product with users willing to pay $20 to $50 per month.

In this guide, we will walk through every layer of building an AI browser extension from scratch: the extension architecture, connecting to LLM APIs, extracting on-page context, building the UI, handling privacy, and shipping to the Chrome Web Store and beyond. This is the same playbook we use when building AI copilots for clients, adapted for the browser extension surface.

Developer workspace with code editor open building a browser extension

Manifest V3: The Foundation You Cannot Skip

Every Chrome extension built today must use Manifest V3. Google deprecated Manifest V2 in mid-2024 and has been progressively disabling V2 extensions in Chrome stable since early 2025. If someone points you to a tutorial that uses "manifest_version": 2, close the tab. It is outdated.

Manifest V3 introduced several changes that directly affect how you build AI features:

Service Workers Replace Background Pages

The persistent background page is gone. Instead, you get a service worker that Chrome spins up on demand and terminates after roughly 30 seconds of inactivity. This is the single biggest adjustment for developers coming from V2. Your background logic cannot hold state in memory between events. Everything must be persisted to chrome.storage or IndexedDB.

For AI extensions, this matters because LLM API calls can take 5 to 15 seconds. If your service worker is making a streaming API call to Claude or GPT-4 and Chrome decides to kill it, the response gets cut off. The workaround is to use chrome.runtime.onConnect with long-lived ports, which keep the service worker alive as long as a content script or popup holds the connection open. Alternatively, offload the API call to a content script or side panel script that has a full page lifecycle.

Content Security Policy Restrictions

V3 tightens the Content Security Policy significantly. You cannot use eval(), inline scripts, or remote code loading. This means no loading a JavaScript SDK from a CDN in your extension pages. Everything must be bundled at build time. For AI extensions, this is fine since you will be making REST API calls to LLM providers, not loading their SDKs into extension pages.

Declarative Net Request Replaces webRequest Blocking

If your AI extension needs to intercept or modify network requests (for example, injecting AI-generated headers or modifying API responses), you now use declarativeNetRequest instead of the old webRequest blocking API. The declarative approach is more limited but performs better because Chrome evaluates rules without waking your service worker.

Permissions You Will Need

A typical AI browser extension requests these permissions in the manifest:

  • activeTab grants temporary access to the current tab when the user clicks your extension icon. This is the least-invasive permission and does not trigger a scary warning during installation.
  • sidePanel lets you open a persistent side panel UI (more on this in the UI section).
  • storage for persisting user preferences, API keys, and conversation history.
  • contextMenus for adding right-click actions like "Summarize selection" or "Explain this."
  • scripting for programmatically injecting content scripts into pages.

Avoid requesting broad host permissions like "<all_urls>" unless your extension genuinely needs to run on every page. Each broad permission triggers an additional warning during install and raises review scrutiny at the Chrome Web Store. Request only the origins you need, or better yet, use activeTab so the user grants access per click.

Content Scripts, Side Panels, and Extension UI Patterns

Your AI extension needs a user interface. In 2026, you have three main surfaces to work with, and the best extensions combine all three.

Content Scripts: AI Embedded in the Page

Content scripts run inside web pages. They can read the DOM, inject UI elements, and respond to user interactions on the page itself. For AI extensions, content scripts are how you do things like:

  • Highlight text on any page and show a floating "Summarize" or "Translate" tooltip
  • Inject an AI writing assistant directly into Gmail's compose window or LinkedIn's message box
  • Add inline annotations, explanations, or fact-checks next to paragraphs of text
  • Overlay a smart reading pane that shows AI-generated summaries alongside the original content

Content scripts run in an isolated world by default, meaning they share the DOM with the host page but have their own JavaScript scope. This prevents conflicts with the page's scripts. You can inject CSS alongside your content script to style your UI elements without the host page's stylesheets interfering. Use Shadow DOM to encapsulate your injected UI components and prevent style leakage in both directions.

Side Panels: The New Standard for AI Extension UIs

Chrome's Side Panel API (available since Chrome 114) is the best UI surface for AI extensions in 2026. It opens a persistent panel on the side of the browser window that stays open as the user navigates between pages. This is exactly the interaction pattern users expect from AI assistants.

The side panel is a full HTML page that you control. You can build it with React, Vue, Svelte, or plain HTML. It has access to all chrome.* APIs and can communicate with content scripts via message passing. Unlike popups, the side panel does not close when the user clicks elsewhere. Unlike content scripts, it has a clean, dedicated space that does not fight with the host page's layout.

For most AI extensions, the side panel serves as the primary chat interface and results display, while content scripts handle on-page interactions like text selection and inline UI. The two communicate through chrome.runtime.sendMessage and chrome.runtime.onMessage.

Popups: Quick Actions Only

The classic extension popup (triggered by clicking the extension icon in the toolbar) still works for simple, quick-action interfaces. But for AI features that involve conversation, streaming responses, or complex output, the popup is too limiting. It closes as soon as the user clicks outside it, which is a terrible experience when you are waiting for a 10-second LLM response.

Use the popup for settings, quick toggles, and launching the side panel. Use the side panel for the core AI experience.

Browser extension side panel interface showing an AI assistant feature

Connecting to LLM APIs and Handling Streaming Responses

The AI brain of your extension is an LLM API call. Here is how to wire it up properly in a browser extension context.

Where to Make the API Call

You have two architectural options, and the choice matters more than most tutorials acknowledge.

Option 1: Direct API calls from the extension. Your extension's service worker or side panel script calls the LLM API directly. The user provides their own API key (stored in chrome.storage), and all requests go straight from the browser to Anthropic, OpenAI, or whichever provider you use. This is the simplest architecture. No backend needed. No server costs. The downside: you cannot hide prompt templates, you cannot aggregate usage analytics easily, and each user needs their own API key.

Option 2: Proxy through your backend. Your extension calls your own API server, which then calls the LLM provider. This is what you want for a commercial product. You control the prompts, manage API keys centrally, handle billing, enforce rate limits, add caching, and collect analytics. The extension sends the extracted page content to your server, your server constructs the full prompt with your proprietary system instructions, calls the LLM, and streams the response back.

For anything beyond a personal tool or open-source project, go with Option 2. The added complexity of a backend pays for itself immediately in control, security, and monetization ability.

Streaming Responses

Users expect to see AI responses appear token by token, not wait 8 seconds for a complete response to pop in all at once. Both Anthropic and OpenAI support server-sent events (SSE) for streaming.

In the extension context, you fetch the streaming endpoint and process the ReadableStream:

  • Use the Fetch API with the stream option from your side panel or service worker
  • Parse each SSE chunk to extract the delta text
  • Append tokens to the UI in real time
  • Handle the [DONE] signal to finalize the response

If your architecture proxies through a backend, your backend streams from the LLM provider and re-streams to the extension. Use the same SSE format so the extension code stays consistent regardless of which LLM provider you call on the server side.

Model Selection for Browser Extensions

Cost and latency matter more in browser extensions than in most other AI products. Users trigger AI actions frequently (every page load, every text selection), and they expect near-instant results. Our recommendation:

  • Claude Haiku 4 or GPT-4o mini for quick actions like summarization, translation, and classification. Sub-second time to first token. Costs under $0.50 per million input tokens.
  • Claude Sonnet 4 for complex tasks like multi-page analysis, long-form writing, and detailed explanations. 1 to 3 second time to first token. Roughly $3 per million input tokens.
  • Claude Opus 4 or GPT-4o reserved for premium features like deep research, code analysis, or document comparison. Only trigger these on explicit user request, never automatically.

A user who summarizes 50 articles per day with Haiku costs you roughly $0.02 to $0.05 per day. That is a business model that works at $10/month pricing. If you accidentally route those same requests through Opus, you are looking at $0.50 to $1.00 per day per user, which eats your margin entirely.

Extracting On-Page Context: The Secret Weapon

The reason an AI browser extension beats a standalone AI app is context. Your extension can see exactly what the user sees. But extracting useful context from arbitrary web pages is harder than it sounds.

DOM Extraction Strategies

The naive approach is to grab document.body.innerText and send the whole thing to the LLM. This works on simple pages but fails badly on complex ones. You get navigation menus, footer links, cookie banners, ad copy, and sidebar widgets mixed in with the actual content. The LLM wastes tokens processing junk and produces lower-quality results.

Better approaches:

  • Readability-style extraction. Use Mozilla's Readability library (the same algorithm behind Firefox Reader View) to extract the main article content from a page. It strips navigation, ads, and chrome, leaving clean text. This works extremely well for news articles, blog posts, and documentation pages.
  • Selection-based extraction. Only process what the user explicitly selects. Use window.getSelection() to capture the highlighted text and its surrounding context. This is the most precise approach and gives the user full control over what the AI sees.
  • Structured data extraction. Many pages include structured data in JSON-LD, OpenGraph meta tags, or schema.org markup. Parse these first for clean, structured context, then fall back to DOM extraction for the body content.
  • Targeted selectors. For specific sites your extension supports deeply (like LinkedIn, Amazon, or GitHub), write custom extractors that use CSS selectors or XPath to pull exactly the fields you need. A LinkedIn profile extractor can grab the name, headline, experience, and skills as structured data instead of a blob of text.

Handling Dynamic Content

Modern web apps load content dynamically. A page might look empty in the initial HTML and only populate after JavaScript executes and API calls return. Your content script needs to wait for the content to actually appear before extracting it.

Use MutationObserver to watch for DOM changes and trigger extraction when the relevant content nodes appear. Set a reasonable timeout (5 to 10 seconds) so you do not wait forever on broken pages. For single-page applications that change content without full page reloads, listen for URL changes via the History API or use a MutationObserver on the main content container.

Context Window Management

A long web page can contain 10,000+ words. Sending all of that to the LLM for every interaction is wasteful. Implement smart truncation:

  • For summarization, send the full content (up to the model's context limit)
  • For Q&A about a specific section, send only the relevant section plus a brief summary of the full page
  • For writing assistance, send the user's selected text plus 500 words of surrounding context

Always tell the LLM what you trimmed. A simple note like "The following is an excerpt from a longer page about [topic]" helps the model calibrate its response. If you are building a research agent that processes multiple pages, consider chunking and summarizing each page before combining them into a single analysis prompt.

WXT, Plasmo, and Modern Extension Frameworks

You can absolutely build a browser extension from scratch with vanilla HTML, CSS, and JavaScript. But if you are building anything non-trivial, especially an AI extension with React-based UIs, streaming state management, and cross-browser support, a framework will save you weeks.

WXT (Web Extension Toolkit)

WXT is the framework we recommend for most projects in 2026. It is built on top of Vite, which means fast builds, hot module replacement during development, and a modern developer experience. Key features:

  • File-based entrypoints. Drop a file in the right directory and WXT automatically wires it into your manifest. A file at entrypoints/sidepanel/index.html becomes your side panel page. A file at entrypoints/content.ts becomes a content script. No manual manifest editing for most cases.
  • Cross-browser support built in. WXT compiles your extension for Chrome, Firefox, Edge, and Safari from a single codebase. It handles manifest format differences (Chrome uses manifest.json, Firefox still accepts manifest v2 or v3), API namespace differences (chrome.* vs browser.*), and platform-specific quirks automatically.
  • TypeScript first. Full type definitions for all WebExtension APIs. Your IDE catches permission errors and API misuse before you even run the extension.
  • Dev mode with HMR. Change your side panel React component, and it hot-reloads instantly without reloading the entire extension. This cuts development time significantly compared to the traditional "change code, rebuild, reload extension, navigate to test page" cycle.

WXT is open source, MIT licensed, and has a growing community. It currently has over 5,000 GitHub stars and is used by several production extensions.

Plasmo

Plasmo takes a more opinionated, batteries-included approach. It bills itself as the "Next.js for browser extensions," and the comparison is apt. Plasmo gives you:

  • CSUI (Content Script UI). A declarative way to mount React, Vue, or Svelte components directly into web pages via content scripts. You export a React component, and Plasmo handles Shadow DOM encapsulation, lifecycle management, and style isolation automatically.
  • Messaging API. A typed, promise-based messaging system between content scripts, background workers, and extension pages. No more raw chrome.runtime.sendMessage with untyped payloads.
  • Storage API. A reactive storage wrapper that works like React state but persists to chrome.storage. Change a value in the side panel, and your content script UI updates automatically.
  • Built-in publishing. Plasmo includes tooling to package and submit your extension to the Chrome Web Store, Firefox Add-ons, and Edge Add-ons from the command line.

The tradeoff with Plasmo is vendor lock-in to their abstractions. If Plasmo does not support a pattern you need, you may find yourself fighting the framework. WXT is more transparent and closer to the native WebExtension APIs.

Our Recommendation

For AI browser extensions specifically, we lean toward WXT. The lighter abstraction layer gives you more control over message passing, streaming, and service worker lifecycle management, all of which are critical for AI features. Plasmo is the better choice if your extension is primarily a content script UI (like a page annotator or inline writing tool) and you want to minimize boilerplate. Either framework is a massive improvement over building from scratch.

Code on a monitor showing browser extension framework development setup

Cross-Browser Support and Chrome Web Store Publishing

Chrome dominates browser market share at roughly 65%, but ignoring Firefox (7%) and Edge (5%) means leaving users on the table, especially in enterprise environments where Edge adoption is significant. The good news: if you use WXT or Plasmo, cross-browser support is mostly handled for you.

Firefox Considerations

Firefox supports both Manifest V2 and V3, and Mozilla has been more conservative about deprecating V2. The Firefox implementation of V3 differs from Chrome in a few important ways:

  • Firefox uses the browser.* namespace with promise-based APIs by default, while Chrome uses chrome.* with callback-based APIs (though Chrome now supports promises too). WXT and Plasmo abstract this away.
  • Firefox does not support the Side Panel API as of early 2026. Instead, you can use a sidebar (browser.sidebarAction) which has existed in Firefox for years. The UX is similar but the API surface is different.
  • Firefox Add-ons review is typically faster than Chrome Web Store review, often completing within a few hours rather than days.

Edge Considerations

Edge is Chromium-based, so Chrome extensions work on Edge with minimal changes. Microsoft's Edge Add-ons store accepts Manifest V3 extensions, and you can often submit the exact same package you submit to the Chrome Web Store. The review process is comparable to Chrome in speed and scrutiny.

Publishing to the Chrome Web Store

Here is what to expect when publishing your AI extension to the Chrome Web Store:

  • Developer registration. One-time $5 fee to create a developer account. You need to verify your identity with a phone number.
  • Review timeline. First-time submissions typically take 3 to 7 business days. Updates to existing extensions review in 1 to 3 days. Extensions requesting broad permissions or sensitive APIs take longer.
  • Privacy requirements. You must provide a privacy policy URL. Google requires you to disclose all data collection, and your extension must comply with their Limited Use policy if you access user browsing data.
  • Single-purpose policy. Chrome Web Store enforces a "single purpose" rule. Your extension must have one clear function. An AI extension that summarizes pages, manages bookmarks, blocks ads, and tracks prices will get rejected. Pick a lane.

Package your extension as a .zip file containing your manifest.json and all bundled assets. Both WXT and Plasmo generate this with a single build command. Upload through the Chrome Web Store Developer Dashboard, fill in the listing details (description, screenshots, category), and submit for review.

Pro tip: prepare 1280x800 screenshots and a 440x280 promotional tile before you start the submission process. The dashboard requires these, and scrambling to create them at the last minute leads to bad first impressions in the store listing. If you are deciding between a Chrome extension and a web app, keep in mind that store listing quality directly affects install rates.

Privacy, Security, and Responsible AI in Extensions

Browser extensions have elevated access to user data. An extension with the right permissions can read every page the user visits, capture form inputs, and access authentication tokens. This power comes with serious responsibility, and real consequences if you get it wrong.

Data Minimization

Only extract and transmit the minimum page content needed for your AI feature to work. If your extension summarizes articles, do not also collect the URLs of every page the user visits. If your extension helps with email writing, do not read emails the user did not explicitly ask you to process.

Be specific in your manifest permissions. Use activeTab instead of "<all_urls>" whenever possible. Request optional permissions via chrome.permissions.request at runtime, explaining to the user exactly why you need them before asking.

API Key Security

If your extension calls LLM APIs directly (without a backend proxy), users will enter their API keys into your extension. Store these keys in chrome.storage.local, never in chrome.storage.sync (which syncs across devices and could leak keys through compromised Google accounts). Better yet, use chrome.storage.session, which is cleared when the browser closes and is never written to disk.

If you use a backend proxy, never expose your server's LLM API keys to the extension. Authenticate users with your own auth system (OAuth, JWT) and let the backend handle all LLM provider credentials.

Content Sent to LLM Providers

This is the privacy issue most AI extension developers overlook. When your extension sends page content to an LLM API, you are transmitting the user's browsing data to a third party. Your privacy policy must disclose this clearly. For enterprise users, this can be a dealbreaker if the data includes confidential business information, medical records, financial data, or legal documents.

Mitigations:

  • Let users review what will be sent before the API call. Show them the extracted text in the side panel and let them edit or redact sensitive parts.
  • Offer a "local only" mode that uses a smaller on-device model (Chrome's built-in Gemini Nano, or a WebAssembly model) for basic tasks. On-device inference keeps data entirely in the browser.
  • Use Anthropic's or OpenAI's data usage policies to your advantage. Both providers offer zero-retention API tiers where input data is not stored or used for training. Document this in your privacy policy.

Chrome's Built-in AI APIs

Starting in late 2025, Chrome began shipping built-in AI capabilities through the chrome.ai namespace. These include on-device summarization, translation, and text classification powered by Gemini Nano running locally. The quality is lower than cloud models, but the privacy story is compelling: zero data leaves the browser.

A smart architecture offers both. Use Chrome's built-in AI for quick, privacy-sensitive tasks (summarizing a page the user is reading, translating a paragraph). Escalate to cloud models when the user explicitly requests deeper analysis or longer output. This hybrid approach gives you the best of both worlds: privacy by default, power on demand.

Costs, Timeline, and Next Steps

Let us break down what it actually takes to build and ship an AI browser extension in 2026.

Development Timeline

  • MVP (4 to 6 weeks): Single AI feature (like page summarization or writing assistance), side panel UI, one browser target (Chrome), basic settings page, backend proxy with auth. This is enough to validate the concept with real users.
  • V1 (8 to 12 weeks): Multiple AI features, content script UI with inline interactions, cross-browser support (Chrome + Firefox + Edge), conversation history, user accounts and billing, polished onboarding flow.
  • Mature product (3 to 6 months): Site-specific extractors for your target domains, multi-model routing for cost optimization, team/enterprise features, analytics dashboard, API for third-party integrations.

Infrastructure Costs

A backend proxy for an AI extension is lightweight. A single $20/month server on Railway or Fly.io handles thousands of concurrent users. Your real cost is LLM API usage:

  • 1,000 daily active users each making 10 AI requests/day with Claude Haiku: roughly $15 to $30/month in API costs
  • 1,000 daily active users each making 10 AI requests/day with Claude Sonnet: roughly $150 to $300/month in API costs
  • 10,000 daily active users with a Haiku/Sonnet mix (80/20): roughly $300 to $600/month

At a $10/month subscription price with even 5% of daily active users on paid plans, 10,000 DAU generates $5,000/month in revenue against $300 to $600 in LLM costs. The margins work.

Team Requirements

A solo full-stack developer can build and ship an MVP. For a V1, you want at least two developers (one focused on the extension frontend, one on the backend and AI integration) plus a designer who understands extension UX patterns. Extensions have unique design constraints: limited viewport width in popups and side panels, overlay UI that must work on any website's visual design, and platform-specific UI guidelines.

Getting Started

If you are planning an AI browser extension, here is the shortest path to something real:

  • Set up a WXT project with TypeScript and React. Run npx wxt@latest init and choose the React template.
  • Build a side panel that takes user input, sends it to Claude's API (directly, no backend needed for prototyping), and streams the response.
  • Add a content script that extracts the current page's text using Readability and sends it to the side panel via message passing.
  • Combine the two: user opens the side panel, clicks "Summarize this page," the content script extracts the text, the side panel sends it to Claude, and streams the summary back.

That is your working prototype in a weekend. From there, you layer on the backend proxy, additional AI features, cross-browser support, and polish.

Building an AI-powered browser extension is one of the highest-leverage projects a development team can take on right now. The platform is mature, the tools are excellent, and users are actively looking for AI that works where they work. If you want help turning your extension idea into a shipped product, book a free strategy call and we will map out the architecture, timeline, and costs together.

Need help building this?

Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.

AI browser extensionChrome extension developmentManifest V3LLM integrationbrowser extension frameworks

Ready to build your product?

Book a free 15-minute strategy call. No pitch, just clarity on your next steps.

Get Started