---
title: "From Vibe Code to Production: Securing AI-Generated Apps 2026"
author: "Nate Laquis"
author_role: "Founder & CEO"
date: "2030-02-16"
category: "AI & Strategy"
tags:
  - vibe coding security
  - AI-generated code
  - code quality
  - application security
  - AI development tools
excerpt: "Y Combinator reports that 95% of code in their latest batch is AI-generated, but nearly half contains serious vulnerabilities. This guide breaks down the exact security review workflows, automated scanning tools, and refactoring strategies you need to turn vibe-coded prototypes into production-grade software."
reading_time: "14 min read"
canonical_url: "https://kanopylabs.com/blog/vibe-coding-to-production-quality-guide"
---

# From Vibe Code to Production: Securing AI-Generated Apps 2026

## The 95% AI-Generated Code Problem

Y Combinator's latest batch made headlines when partners revealed that 95% of the code across their portfolio companies was generated by AI. Cursor, Bolt, Lovable, and a growing list of [vibe coding tools](/blog/vibe-coding-tools-cursor-vs-bolt-vs-lovable) have fundamentally changed how startups build software. A solo founder can now ship a working MVP in a weekend. That is genuinely remarkable, and it is also genuinely dangerous.

Here is the number that should keep you up at night: roughly 45% of AI-generated code contains at least one exploitable vulnerability. That figure comes from multiple independent audits of LLM-produced codebases, and it aligns with what we see in our own security reviews. The vulnerabilities are not obscure edge cases. They are the basics: hardcoded API keys, missing authentication checks, SQL injection, insecure direct object references, and unvalidated user input flowing directly into database queries.

The gap between a vibe-coded MVP and a production-grade application is the single biggest challenge facing AI-era startups. You can go from zero to demo in hours, but going from demo to production-ready takes weeks of disciplined security work that most founders skip entirely. They launch fast, acquire users, and then discover their app has been leaking customer data through an endpoint that never checked if the requesting user actually owned the resource they were accessing.

![Developer writing code on a monitor showing software development workflow](https://images.unsplash.com/photo-1555949963-ff9fe0c870eb?w=800&q=80)

This guide is not about discouraging vibe coding. We use these tools daily and recommend them to clients. It is about closing the gap between what AI generates and what your users deserve: software that does not compromise their data. We will walk through the exact security review workflows, automated scanning pipelines, and refactoring strategies we use to take AI-generated codebases from prototype to production.

## The Most Common Vulnerability Patterns in AI-Generated Code

Before you can fix the problems, you need to know what to look for. After auditing dozens of AI-generated codebases across Cursor, Bolt, Lovable, and Claude-based workflows, we see the same vulnerability patterns over and over. LLMs are remarkably consistent in the mistakes they make, which is actually good news because it means you can build systematic defenses.

**Hardcoded secrets and API keys.** This is the most common issue by far. When you prompt an AI to integrate with Stripe, SendGrid, or any third-party API, the generated code frequently includes placeholder keys inline or, worse, the actual keys you pasted into the chat context. We reviewed one codebase where the OpenAI API key was hardcoded in three separate files, and the Supabase service role key (which bypasses all row-level security) was embedded directly in a client-side React component. That key was committed to a public GitHub repository. The fix is simple: environment variables, a .env file excluded from version control, and a pre-commit hook that scans for high-entropy strings. But AI tools almost never set this up correctly on the first pass.

**Missing or broken authentication checks.** AI-generated API routes often lack proper authorization. The LLM creates a perfectly functional CRUD endpoint but forgets to verify that the user making the request actually has permission to access or modify the resource. We see this constantly with Next.js API routes and Express handlers: the route works, the data flows, but there is zero middleware checking the session token or verifying resource ownership. A real example from a client audit: their AI-generated endpoint at `/api/users/[id]/billing` returned the full billing history for any user ID passed in the URL. No auth check. Any logged-in user could view any other user's payment history by changing the ID parameter.

**SQL injection and NoSQL injection.** LLMs frequently build queries using string concatenation instead of parameterized queries. This happens more often with raw SQL than with ORMs, but we have seen it with MongoDB queries too. The AI writes code that technically works during development, where inputs are clean, but crumbles the moment a malicious actor sends a crafted payload. Prisma and Drizzle ORM help prevent this at the framework level, but AI-generated raw queries remain a serious risk.

**Insecure data exposure in API responses.** AI-generated endpoints frequently return entire database objects instead of selecting specific fields. Your user profile endpoint returns the password hash, internal flags, admin status, and every other column alongside the name and email. The LLM optimizes for "making it work" and grabs everything from the database. It does not consider what should or should not be sent to the client.

**Missing rate limiting and input validation.** Almost no AI-generated code includes rate limiting out of the box. Your sign-up endpoint, your login route, your password reset flow: all wide open for brute force attacks and abuse. Input validation is similarly absent. The AI trusts that incoming data will match the expected shape, and it rarely adds schema validation with libraries like Zod or Joi.

## Building a Security Review Workflow for AI Code

Knowing the vulnerability patterns is step one. Step two is building a workflow that catches them before they reach production. The key insight is that AI-generated code requires a different review approach than human-written code. Human developers make varied, context-dependent mistakes. AI makes systematic, predictable mistakes. Your review process should be optimized for that predictability.

**Step 1: Automated scanning on every commit.** Set up a CI pipeline that runs static analysis on every push. At minimum, you need three tools: Semgrep for pattern-based vulnerability detection, Gitleaks or TruffleHog for secrets scanning, and ESLint with security-focused rulesets (like eslint-plugin-security) for JavaScript and TypeScript projects. These three tools together catch 70-80% of the common AI-generated vulnerabilities with zero manual effort. Semgrep is particularly effective because you can write custom rules targeting the exact patterns LLMs produce. We maintain a ruleset specifically for AI-generated code patterns that flags things like unparameterized queries, missing auth middleware on route handlers, and overly permissive CORS configurations.

**Step 2: Dependency auditing.** AI tools are notorious for pulling in outdated or vulnerable dependencies. The LLM's training data has a cutoff, so it frequently recommends package versions with known CVEs. Run `npm audit` or `yarn audit` on every build. Better yet, integrate Snyk into your CI pipeline for continuous dependency monitoring. Snyk catches vulnerabilities that npm audit misses and provides actionable fix recommendations. For Python projects, Safety and pip-audit serve the same purpose. Budget $25 to $50 per month for Snyk on a small project. It pays for itself the first time it catches a critical dependency vulnerability.

![Security compliance dashboard displaying vulnerability scanning and audit results](https://images.unsplash.com/photo-1563986768609-322da13575f2?w=800&q=80)

**Step 3: Manual review focused on auth and data boundaries.** Automated tools cannot fully evaluate business logic. A human reviewer needs to verify three things for every endpoint: Does it check authentication? Does it verify authorization (resource ownership)? Does it return only the data the requesting user should see? We use a simple checklist approach. Every API route gets tagged with its auth requirements during code review, and any route that touches user data or performs mutations gets a dedicated authorization review. This manual step adds 30 to 60 minutes per review cycle, but it catches the logic-level vulnerabilities that scanners miss entirely.

**Step 4: Pre-deployment security gates.** Before any code reaches production, it must pass through a quality gate that blocks deployment if critical issues are found. We configure this as a required status check in GitHub Actions: if Semgrep finds a high-severity issue, or if Snyk detects a critical CVE in a dependency, the merge is blocked. No exceptions, no manual overrides except from the security lead. This single practice has prevented more production incidents than any other measure we have implemented.

## Automated Scanning Tools That Actually Work

The security tooling landscape is crowded with options, and most founders waste time evaluating tools instead of deploying one and iterating. Here is the stack we have validated across dozens of AI-generated codebases, with honest assessments of what each tool catches and what it misses.

**Semgrep (free tier available, $40/month for Teams).** This is the single most valuable tool for scanning AI-generated code. Semgrep uses pattern matching to find vulnerabilities, and it supports custom rules that let you encode exactly the mistakes LLMs make. Out of the box, Semgrep catches SQL injection, XSS, insecure cryptography, and path traversal. With custom rules, you can flag patterns like Express routes without authentication middleware, React components that dangerously set inner HTML, and API handlers that return full database objects. We run Semgrep in CI and locally as a pre-commit hook. Detection rate for common AI-generated vulnerabilities: approximately 65-75%.

**CodeQL (free for open source, included with GitHub Advanced Security).** GitHub's CodeQL performs deeper semantic analysis than Semgrep. Instead of pattern matching, it builds a queryable database of your code's data flow and control flow, then runs queries that trace how tainted input propagates through your application. This makes it excellent at catching injection vulnerabilities where user input passes through multiple functions before reaching a dangerous sink like a database query or system call. The downside is speed: CodeQL analysis can take 5 to 15 minutes on a medium-sized codebase, so we run it on pull requests rather than every commit. Detection rate for data-flow vulnerabilities: approximately 80-85%.

**Snyk (free tier for individual developers, $25/month for Teams).** Snyk specializes in dependency vulnerabilities and container security. For AI-generated projects, the dependency scanning is essential because LLMs consistently recommend outdated packages. Snyk monitors your dependency tree continuously and alerts you when new CVEs are published against packages in your lockfile. It also provides automated fix PRs that bump the vulnerable dependency to a patched version. We run Snyk in CI and enable Slack notifications for critical vulnerabilities. It has caught pre-auth remote code execution vulnerabilities in transitive dependencies that would have been invisible without automated scanning.

**Gitleaks (free, open source).** Purpose-built for detecting secrets in git repositories. Gitleaks scans your entire commit history, not just the current codebase, which matters because AI-generated commits often contain secrets that were later removed but still exist in git history. Run it as a pre-commit hook to prevent secrets from being committed in the first place, and run a full repository scan periodically to catch anything that slipped through. We pair Gitleaks with a custom allowlist file that reduces false positives from test fixtures and example configurations.

**The practical setup.** For a typical AI-generated Next.js or Express project, our CI pipeline runs in this order: Gitleaks (secrets scan, 10 seconds), ESLint with security rules (30 seconds), Semgrep (1 to 2 minutes), npm audit (15 seconds), and Snyk (1 to 2 minutes). CodeQL runs separately on pull requests. Total CI time added: under 5 minutes. Total monthly cost for a small team: $50 to $100. That is the cheapest insurance you will ever buy.

## Refactoring AI-Generated Spaghetti Code

Security is only half the battle. AI-generated codebases have a second, equally serious problem: structural quality. LLMs generate code one prompt at a time, with limited awareness of the broader architecture. The result is what we call "prompt-driven spaghetti": code that works in isolation but creates a tangled, unmaintainable mess when assembled together. Duplicated logic across files, inconsistent error handling patterns, mixed abstraction levels in single functions, and components that are tightly coupled for no reason.

The refactoring process needs to be systematic, not ad hoc. Rewriting everything from scratch defeats the purpose of using AI tools in the first place. Instead, focus your refactoring effort on the areas that matter most for maintainability and security.

**Extract and centralize authentication.** AI-generated codebases typically implement auth checks inline in every route handler, with slightly different logic each time. Your first refactoring pass should extract all auth logic into middleware. For Express, create a single `requireAuth` middleware and a `requireOwnership` middleware that handles resource-level authorization. For Next.js App Router, build a wrapper function or use the middleware.ts file to protect route groups. This single change eliminates the most dangerous class of bugs: inconsistent authorization across endpoints.

**Standardize error handling.** AI code handles errors differently in every file. One route catches errors and returns a 500 with a stack trace (information disclosure). Another route does not catch errors at all (crash). A third catches errors but returns a 200 with an error message in the body (broken semantics). Create a global error handler that logs the full error internally and returns sanitized, consistent error responses to the client. In Express, this is a four-argument middleware function at the end of your middleware chain. In Next.js, it is an error.tsx boundary combined with try-catch wrappers in your server actions.

![Laptop screen showing clean code refactoring and development environment](https://images.unsplash.com/photo-1517694712202-14dd9538aa97?w=800&q=80)

**Consolidate data access patterns.** AI-generated code frequently scatters database queries across route handlers, utility functions, and even React components (in the case of Supabase client-side queries). Pull all database access into a dedicated data access layer or repository pattern. Each database table gets a module with functions like `getUserById`, `updateUserProfile`, and `deleteUser`. These functions handle input validation, field selection (so you never accidentally return password hashes), and proper error mapping. Every route handler calls these functions instead of writing raw queries.

**Add input validation at the boundary.** Install Zod and create validation schemas for every API endpoint's request body, query parameters, and URL parameters. Parse incoming data at the very top of each handler before any business logic executes. This single practice eliminates injection vulnerabilities, type confusion bugs, and a whole class of unexpected behavior caused by malformed input. AI tools occasionally generate Zod schemas, but they almost never apply them consistently across all endpoints.

**Remove dead code and unused dependencies.** AI-generated projects accumulate dead code faster than human-written ones. Every time you re-prompt the AI to try a different approach, the old approach's code often stays in the codebase. Run `knip` or `ts-prune` to identify unused exports, unreachable functions, and orphaned files. Remove them aggressively. Then audit your package.json: AI tools add dependencies liberally, and your project likely includes packages that were used in an earlier iteration but are no longer imported anywhere. Every unnecessary dependency is an unnecessary attack surface.

## Code Quality Gates Before Launch

You have scanned for vulnerabilities, refactored the worst structural issues, and centralized your auth logic. Before you launch, there is a final set of quality gates that separate amateur deployments from professional ones. These are the checks that protect you when things go wrong in production, because they will go wrong.

**Environment and secrets management.** Verify that zero secrets exist in your codebase or git history. Run Gitleaks against your full repository history, not just the latest commit. Set up your environment variables in your deployment platform (Vercel, Railway, Fly.io) and verify that your application fails gracefully when a required environment variable is missing. The app should refuse to start with a clear error message, not crash mid-request when it first tries to use the undefined key. Write a startup validation function that checks every required env var and exits immediately if any are absent.

**HTTPS, CORS, and security headers.** Every production deployment needs strict CORS configuration that allows only your actual domains, not the wildcard `*` that AI tools default to. Set security headers including Content-Security-Policy, X-Content-Type-Options, Strict-Transport-Security, and X-Frame-Options. If you are on Vercel or Netlify, configure these in your vercel.json or netlify.toml. For Express apps, the `helmet` middleware sets sensible defaults. Test your headers with securityheaders.com and aim for an A grade minimum.

**Rate limiting on all public endpoints.** Implement rate limiting on authentication endpoints (login, signup, password reset), API endpoints that perform mutations, and any endpoint that triggers external API calls or sends emails. For Express, `express-rate-limit` with a Redis store handles this cleanly. For Next.js on Vercel, use Vercel's built-in WAF rules or Upstash Redis-based rate limiting. Typical thresholds: 5 login attempts per minute per IP, 60 API requests per minute per authenticated user, and 3 password reset requests per hour per email address.

**Logging and monitoring.** AI-generated apps almost never include proper logging. Before launch, add structured logging to every API endpoint that captures the request method, path, user ID (if authenticated), response status code, and response time. Use a service like Axiom, Datadog, or even a simple Loki stack to aggregate and search logs. Set up alerts for error rate spikes, unusual traffic patterns, and failed authentication attempts. The cost is $20 to $50 per month for a small application, and it is the difference between discovering a breach in minutes versus discovering it when a customer emails you.

**Backup and recovery testing.** If your application uses a database, verify that automated backups are running and that you can actually restore from one. We have seen too many AI-generated apps deployed on Supabase or PlanetScale with backups enabled but never tested. Run a restore drill before launch. Spin up a test database from the latest backup and verify that your application functions correctly against it. This takes an hour and could save your entire company.

**Load testing critical paths.** AI-generated code often contains performance bottlenecks that only surface under load: N+1 query patterns, missing database indexes, synchronous operations that should be async, and unbounded result sets that return thousands of rows when a user has enough data. Run a basic load test with k6 or Artillery against your critical user flows. Even 50 concurrent users for 5 minutes will reveal the worst bottlenecks. Fix anything that degrades beyond acceptable response times before your first real users hit those same paths.

## Bridging the Gap: From Vibe-Coded MVP to Production-Grade App

The AI coding revolution is real, and the productivity gains are enormous. Founders who would have needed three months and a $50,000 budget to build a working MVP can now ship in a weekend. That changes the game for early-stage startups, and we are fully on board with it. The [cost reductions from AI-assisted development](/blog/ai-agents-reducing-development-costs) are transforming how software gets built.

But the gap between "it works in a demo" and "it is safe to put customer data in" is wider than most founders realize. That gap is where companies get breached, lose customer trust, fail compliance audits, and watch their reputations unravel. Vibe coding gets you to market fast. Security and code quality keep you in the market.

The playbook is straightforward. Use AI tools aggressively for the initial build. Then apply the systematic security and quality processes outlined in this guide before you expose the application to real users. Automate the scanning. Refactor the critical paths. Lock down the auth layer. Add the quality gates. This work typically takes 2 to 4 weeks for a moderately complex application, which is a small investment relative to the time AI saved you during the build phase.

If you are a technical founder, you can implement most of this yourself using the tools and workflows we have described. Set up Semgrep and Gitleaks in CI this week. Run Snyk against your dependencies today. Audit every API route for missing auth checks this weekend. These three actions alone will eliminate the majority of critical vulnerabilities in a typical vibe-coded application.

If you are a non-technical founder who used Bolt or Lovable to build your app, the calculus is different. You need someone who understands both the security landscape and the specific patterns that AI tools produce. A generalist developer who has never audited AI-generated code will miss the systematic vulnerabilities that LLMs introduce. You need a team that has [guided companies through security audits](/blog/how-to-pass-a-security-audit) and knows exactly where AI-generated code breaks down.

We run security audits and production-hardening engagements specifically for AI-generated codebases. The typical engagement takes 2 to 3 weeks: one week of automated and manual security review, one week of refactoring and remediation, and a final round of validation and deployment. The result is a codebase you can confidently put in front of paying customers, pass a SOC 2 audit with, and hand off to a future engineering team without them rewriting everything from scratch.

Your vibe-coded MVP proved the market wants what you are building. Now it is time to make sure the foundation can support the growth. [Book a free strategy call](/get-started) and we will review your AI-generated codebase, identify the highest-risk vulnerabilities, and map out a plan to get you to production-grade in weeks, not months.

---

*Originally published on [Kanopy Labs](https://kanopylabs.com/blog/vibe-coding-to-production-quality-guide)*