Three Philosophies for the Same Problem
Background job processing in Node.js has split into three distinct camps, and each one makes a fundamentally different bet about where your infrastructure complexity should live. Trigger.dev v3 says "let us handle everything." BullMQ says "you already have Redis, so use it." Graphile Worker says "your Postgres database is enough." None of them are wrong. But picking the wrong one for your situation will cost you months of refactoring, unexpected bills, or operational headaches that drain your team's energy.
We have deployed all three across client projects at Kanopy. A fintech startup processing 2M daily webhook deliveries on BullMQ. A healthcare SaaS running HIPAA-compliant document pipelines on Trigger.dev v3. An analytics platform that refused to add Redis to its stack, so we built its entire job layer on Graphile Worker. Each tool earned its place, but the decision was never obvious up front.
This guide breaks down the architecture, developer experience, failure handling, scheduling, concurrency, observability, and cost of each option. If you have read our Inngest vs Trigger.dev vs Temporal comparison, think of this as the practical counterpart. That article covered the managed orchestration tier. This one covers the tools you reach for when you want more control, lower cost, or a tighter coupling to your existing infrastructure.
Architecture: Managed Cloud vs Redis vs Postgres
The architectural choice you make here cascades into every other decision: deployment, scaling, monitoring, cost, and hiring. Understanding what each tool actually does under the hood is the fastest way to rule out the wrong option.
Trigger.dev v3: Managed Workers on Their Cloud
Trigger.dev v3 is a complete rewrite from v2. Your task code deploys to Trigger.dev's managed infrastructure (built on top of Kubernetes and their own container runtime). You write a task file, run the CLI deploy command, and Trigger.dev handles worker provisioning, scaling, log aggregation, and retry orchestration. The execution happens on their servers, not yours. Your application triggers tasks via the SDK or REST API, and Trigger.dev takes it from there.
This is a fundamentally different model from BullMQ or Graphile Worker. You are not running workers in your own infrastructure. You are shipping code to a platform that runs it for you. The upside is zero operational overhead. The downside is that your job code runs in someone else's environment, which matters for compliance-sensitive workloads. Self-hosting is available through their open-source Docker setup, but most teams use the managed cloud.
BullMQ: Redis-Backed Queues You Run Yourself
BullMQ is a Node.js library. You install it via npm, point it at a Redis instance, and it gives you named queues, workers, job scheduling, and retry logic. Your workers run in your own processes, on your own servers, in your own containers. BullMQ does not host anything for you. It is a library, not a platform.
This means you own the entire stack. You run Redis (or use a managed Redis service like AWS ElastiCache, Upstash, or Redis Cloud). You deploy your worker processes alongside your application. You handle scaling, monitoring, and log aggregation. The tradeoff is total control with total responsibility. BullMQ has been in production at thousands of companies for years, with the npm package pulling over 1.5 million weekly downloads. It is the most battle-tested option in this comparison.
Graphile Worker: Jobs Inside Your Postgres Database
Graphile Worker takes the most radical approach. It uses your existing Postgres database as the job queue. Jobs are rows in a table. Workers poll that table using SKIP LOCKED (a Postgres feature designed for exactly this pattern). There is no Redis, no external message broker, no separate infrastructure to manage. If you have Postgres, you have a job queue.
The elegance of this approach is transactional consistency. You can enqueue a job inside the same database transaction that creates a user record. If the transaction rolls back, the job never exists. This is impossible with Redis-backed queues without building your own outbox pattern. For applications where job creation must be atomic with your business logic, Graphile Worker eliminates an entire class of bugs that plague Redis-based systems.
TypeScript Support and Developer Experience
All three tools work with TypeScript, but the depth of that support varies dramatically. If your team is TypeScript-first, this section will matter more than pricing.
Trigger.dev v3: TypeScript-Native from the Ground Up
Trigger.dev v3 was built in TypeScript, for TypeScript. Task definitions are fully typed. Input payloads are validated with Zod schemas. The SDK provides type inference for task triggers, so when you call myTask.trigger({ userId: "abc" }), the compiler knows the shape of the payload. The CLI scaffolds task files with proper types. The dev server gives you hot reloading during local development. Of the three tools, Trigger.dev offers the most polished TypeScript experience, and it is not close.
BullMQ: Strong Types with Manual Setup
BullMQ ships with TypeScript definitions and supports generic type parameters on queues and workers. You can define Queue<MyJobData> and Worker<MyJobData> to get type safety on job payloads. However, BullMQ does not enforce a project structure or provide scaffolding. You wire up your own queue definitions, worker files, and connection management. The types are solid, but the developer experience is "library-grade" rather than "platform-grade." You get flexibility at the cost of boilerplate.
Graphile Worker: JavaScript-First with TypeScript Support
Graphile Worker supports TypeScript, but it was designed with JavaScript in mind. Task functions are defined as named exports in a task directory. Type safety on job payloads requires manual interface definitions. The library works fine with TypeScript, but you will not get the same level of type inference and IDE integration that Trigger.dev provides. If TypeScript DX is your top priority, Graphile Worker is the weakest of the three, though it is perfectly usable.
Local Development
Trigger.dev's dev server runs your tasks locally with hot reloading and a local dashboard UI. BullMQ requires you to run Redis locally (via Docker or a native install) and spin up your worker process manually. Graphile Worker just needs your local Postgres database, which you almost certainly already have running. For sheer simplicity, Graphile Worker wins the local dev experience. For tooling richness, Trigger.dev wins.
Retries, Dead Letter Queues, and Failure Handling
Your background job system is defined by how it handles failure. Every job will fail eventually. Network timeouts, rate limits, out-of-memory errors, bad input data. The question is whether the failure is graceful, observable, and recoverable.
Trigger.dev v3 Retry Model
Trigger.dev v3 provides declarative retry configuration per task. You specify maxAttempts, a backoff strategy (fixed, exponential, or custom), and the minimum and maximum delay between attempts. When a task exhausts its retries, it moves to a "failed" state visible in the dashboard. You can inspect the full execution log, see the error stack trace for each attempt, and manually retry from the dashboard with one click. There is no traditional dead letter queue because Trigger.dev is the queue. Failed tasks stay in the system permanently until you resolve or delete them.
BullMQ Retry and Dead Letter Queues
BullMQ has the most mature retry and dead letter queue implementation of the three. You configure attempts and backoff per job or per queue. When retries are exhausted, BullMQ moves the job to a dedicated failed set. You can also configure an explicit dead letter queue (a separate BullMQ queue) where failed jobs land for downstream processing, alerting, or manual review. This pattern integrates naturally with event-driven architectures where failed jobs trigger compensating actions.
BullMQ also supports per-job retry delays, custom backoff functions (not just exponential), and the ability to programmatically move jobs between states. You can remove a job from "failed," modify its data, and re-add it to the queue. This level of control is unmatched by either Trigger.dev or Graphile Worker.
Graphile Worker Retry Behavior
Graphile Worker supports retries with exponential backoff. You configure max_attempts per task. When a job fails and has remaining attempts, Graphile Worker updates the row in Postgres with the next retry timestamp and the error details. When retries are exhausted, the job is marked as permanently failed. There is no built-in dead letter queue concept, but because jobs are just Postgres rows, you can query for failed jobs, build your own alerting with a simple SQL query, or create a trigger that moves failed rows to a separate table.
The Postgres-native approach to failure handling has an underrated advantage: you can JOIN failed jobs against your business data. Want to see all failed invoice-generation jobs for customers on the Enterprise plan? That is a single SQL query. Try doing that with Redis.
Cron Scheduling, Concurrency Control, and Long-Running Jobs
Beyond basic "enqueue and process" functionality, most production systems need scheduled jobs, concurrency limits, and support for tasks that run longer than a few seconds. Here is how each tool handles these requirements.
Cron Scheduling
Trigger.dev v3 has first-class cron support. You define a schedule directly in your task file using standard cron syntax, and Trigger.dev runs it automatically. No external cron service, no additional infrastructure. The dashboard shows scheduled task history, upcoming runs, and lets you manually trigger a scheduled task outside its normal cadence.
BullMQ supports repeatable jobs with cron expressions. You add a job with a repeat option specifying the cron pattern, and BullMQ handles the scheduling. The implementation is solid, but you need to be careful about duplicate schedules. If your application starts multiple times (common in horizontally scaled deployments), you can accidentally create duplicate repeatable jobs. The standard pattern is to check for existing repeatable jobs on startup and remove duplicates. This is a well-known gotcha that trips up every team at least once.
Graphile Worker uses a crontab file (or programmatic cron definitions) to define scheduled tasks. The syntax is familiar, the behavior is reliable, and because the schedule state lives in Postgres, it is naturally consistent across multiple worker instances. No duplicate schedule problem. Graphile Worker handles leader election internally so only one worker triggers each scheduled run.
Concurrency Control
Trigger.dev v3 lets you set concurrency limits per task. You specify a maxConcurrency value, and Trigger.dev ensures no more than that many instances of the task run simultaneously across all workers. This is essential for rate-limited APIs. If Stripe allows 100 requests per second, you set your stripe-sync task to maxConcurrency: 50 and avoid getting throttled.
BullMQ provides concurrency control at the worker level (how many jobs a single worker processes in parallel) and at the queue level via rate limiters. The rate limiter lets you specify max jobs per time window, which maps directly to API rate limits. You can also use named concurrency groups to limit concurrency across multiple queues. BullMQ's concurrency controls are the most flexible of the three, at the cost of more configuration.
Graphile Worker sets concurrency at the worker level with a concurrency flag. Each worker process handles up to N jobs in parallel. There is no built-in per-task concurrency limit or rate limiter. If you need to limit a specific task type to 5 concurrent executions, you need to implement that yourself using Postgres advisory locks or a semaphore pattern. This is the biggest gap in Graphile Worker's feature set.
Long-Running Job Support
Trigger.dev v3 excels at long-running jobs. Tasks can run for up to 2.5 hours on the paid plan (and up to 24 hours on enterprise plans). The platform streams logs in real time, so you can watch a 30-minute data migration execute live. This is Trigger.dev's strongest competitive advantage over BullMQ, where long-running jobs risk Redis connection timeouts and worker process management headaches.
BullMQ jobs should generally complete within minutes. For longer tasks, you need to implement heartbeating (periodically updating the job's progress to prevent it from being considered stalled) and handle worker restarts gracefully. It works, but it requires careful engineering. Most BullMQ users keep individual jobs short and break long operations into chains of smaller jobs.
Graphile Worker does not impose time limits on jobs (your Postgres connection timeout is the practical limit). However, long-running jobs hold a Postgres connection for their entire duration, which eats into your connection pool. For jobs running longer than a few minutes, this can become a bottleneck. The recommended pattern is the same as BullMQ: break long operations into smaller, chained jobs.
Observability, Dashboards, and Pricing at Scale
Observability is where managed platforms justify their cost, and where self-hosted solutions demand the most investment. Pricing at scale is where the math gets interesting.
Observability and Dashboards
Trigger.dev v3 includes a built-in web dashboard that shows every task run, its status, execution duration, logs, and error traces. You can filter by task name, status, or time range. You can replay failed tasks, cancel running tasks, and view real-time streaming logs. For teams that have never had visibility into their background jobs, this alone is worth the cost of admission. Trigger.dev also exports OpenTelemetry traces, so you can pipe data into Datadog, Honeycomb, or Grafana.
BullMQ has no built-in dashboard. The community has built several options: Bull Board (open-source, embeddable in Express/Fastify/Koa apps), Arena (another open-source option), and Taskforce.sh (a paid hosted dashboard). Bull Board is the most popular choice. You add it to your application, and it gives you a web UI showing queues, jobs, and their states. It is functional but not as polished as Trigger.dev's dashboard. For production monitoring, most BullMQ teams export metrics to Prometheus or Datadog using custom instrumentation, then build their own Grafana dashboards. This works well but requires upfront investment.
Graphile Worker has no built-in dashboard and no widely adopted community dashboard. Monitoring is done through SQL queries against the jobs table, custom Prometheus exporters, or application-level logging. This is the weakest observability story of the three. You can build solid monitoring with pg_stat_statements, custom queries, and Grafana, but you are building it from scratch. For teams that need visibility out of the box, this is a significant drawback.
Pricing Breakdown at Scale
Trigger.dev v3 pricing is based on compute time. The free tier includes 30,000 compute seconds per month. The Pro plan starts at $50/month. At 1M lightweight jobs per month (each averaging 500ms of compute), you are looking at roughly $150 to $300/month depending on your plan. For compute-heavy jobs (AI inference, video processing), costs scale with actual CPU time, which can climb quickly. The advantage is that you pay nothing for infrastructure management, monitoring, or worker scaling.
BullMQ itself is free and open-source. Your cost is Redis hosting plus the compute for your worker processes. On AWS, a small ElastiCache Redis instance (cache.t3.micro) runs about $15/month. A dedicated worker on a t3.medium EC2 instance adds roughly $30/month. At scale (10M+ jobs/month), you might need a larger Redis instance ($50 to $200/month) and 3 to 5 worker instances ($100 to $500/month). Total cost at scale: $150 to $700/month, but you also need to factor in the engineering time to manage, monitor, and troubleshoot the infrastructure.
Graphile Worker is free and open-source. Your cost is zero beyond your existing Postgres database. If your Postgres instance can handle the additional load (and for most applications processing under 1M jobs/month, it can), Graphile Worker adds no infrastructure cost whatsoever. At very high volumes (5M+ jobs/month), you may need to scale your Postgres instance, add read replicas, or implement table partitioning on the jobs table. But for small-to-mid-scale applications, Graphile Worker is effectively free.
The Cost Verdict
If you are bootstrapped and already running Postgres, Graphile Worker is the clear winner on cost. If you already have Redis in your stack and an ops-capable team, BullMQ is the most cost-effective option at scale. If your team's time is more expensive than Trigger.dev's monthly bill (and for most funded startups, it is), the managed platform pays for itself by eliminating operational toil. Run the numbers for your specific workload before deciding. The cheapest tool is the one that does not consume engineering hours you could spend on your product.
When to Pick Each One and Our Recommendation
After deploying all three in production across different client projects, here is the honest recommendation we give to every team that asks.
Choose Trigger.dev v3 When
You want zero operational overhead and your team should be building product, not managing infrastructure. Trigger.dev v3 is the right choice for funded startups with 3 to 15 engineers who need reliable background jobs without hiring a DevOps specialist. It shines for AI-heavy workloads (LLM calls, document processing, image generation) where tasks run for minutes and you need real-time visibility. The built-in dashboard, log streaming, and one-click retries save hours of debugging time every week. If you are evaluating Trigger.dev alongside higher-level orchestration tools, our comparison of Inngest, Trigger.dev, and Temporal covers the managed platform tier in more depth.
Choose BullMQ When
You already run Redis, your team is comfortable managing infrastructure, and you need maximum flexibility. BullMQ is the right choice for teams processing high volumes of short-lived jobs (email delivery, webhook forwarding, cache invalidation, notification fanout) where each job completes in under 30 seconds. It is the right choice when you need advanced queue patterns like priority queues, job dependencies, FIFO ordering, or complex rate limiting. If your stack already includes Redis for caching or session storage, adding BullMQ is a trivial incremental cost. BullMQ is also the safest bet if you are concerned about vendor lock-in. It is an npm package with no cloud dependency. You can switch Redis providers or even run Redis on bare metal without changing a line of application code.
Choose Graphile Worker When
You are building a Postgres-centric application and you refuse to add another piece of infrastructure to your stack. Graphile Worker is the right choice for small teams (1 to 5 engineers) building monolithic applications where the database is the center of gravity. It excels when you need transactional job enqueuing (create a user and enqueue a welcome email in the same transaction). It is perfect for CRUD-heavy SaaS products that need simple background processing without the complexity of Redis or a managed platform. If you are setting up your CI/CD pipeline and trying to keep your infrastructure minimal, Graphile Worker aligns perfectly with that philosophy.
The Decision Matrix
- Team wants zero-ops, has budget for a managed service: Trigger.dev v3. You will be running production jobs by end of day.
- Redis already in the stack, high-volume short jobs: BullMQ. Battle-tested, flexible, and you control everything.
- Postgres-only stack, transactional consistency matters: Graphile Worker. No new infrastructure, no new billing line items.
- AI/ML workloads with tasks running 5+ minutes: Trigger.dev v3. The long-running job support and streaming logs are unmatched.
- Need dead letter queues and advanced queue patterns: BullMQ. Its queue primitives are the most powerful of the three.
- Bootstrapped with under 100K jobs/month: Graphile Worker. Zero additional cost if you already have Postgres.
What We Build for Our Clients
For most SaaS products we build at Kanopy, we start with one of these three tools rather than reaching for heavier orchestration platforms. The right choice depends on the existing stack, the team's operational maturity, and the nature of the workload. We have seen teams waste months building custom queue infrastructure that one of these tools handles out of the box. We have also seen teams adopt Trigger.dev when BullMQ would have been simpler and cheaper, or stick with a hacked-together cron setup when Graphile Worker would have solved their problems in an afternoon.
If you are building a product and background jobs are becoming a pain point, or if you are about to make this infrastructure decision for the first time, our engineering team can help you evaluate the options against your specific requirements. Book a free strategy call and we will map out the right architecture for your workload, team size, and budget.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.