When to Migrate (and When to Stay Put)
Before you touch a single line of code, you need to answer one honest question: is your monolith actually the problem? In our experience building and refactoring systems for over 200 companies, roughly half the teams that come to us wanting microservices would be better served by cleaning up their existing monolith. A well-structured modular monolith can handle millions of requests per day. If your pain is a messy codebase, microservices will just give you a messy distributed system.
You should seriously consider migrating when you hit at least two of these signals:
- Deployment bottlenecks: Multiple teams step on each other during releases, and a small change in the billing module forces a full regression of the entire application.
- Scaling mismatches: Your search feature needs 10x the compute of your user profile service, but you are forced to scale the entire monolith to handle search traffic spikes.
- Technology lock-in: You need to adopt a different language or framework for a specific domain (Python for ML, Go for a high-throughput data pipeline), but your monolith is a single-language fortress.
- Team autonomy: You have grown past 20 engineers, and cross-team coordination on a shared codebase is slowing everyone down.
If none of those resonate, stop here. Invest in better module boundaries, automated testing, and CI/CD instead. A monolith deployed in 5 minutes beats a microservices mesh that takes 45 minutes to get through a staging pipeline. For a deeper comparison of the trade-offs, read our breakdown on monolith vs. microservices architecture.
The Strangler Fig Pattern: Your Migration Framework
The strangler fig pattern is the single most important concept in any monolith-to-microservices migration. Named after the tropical fig that gradually grows around a host tree until it replaces it entirely, this pattern lets you incrementally extract services from your monolith without a risky big-bang rewrite.
Here is how it works in practice:
- Step 1: Place an API gateway or reverse proxy (Kong, AWS API Gateway, or even Nginx) in front of your monolith. All traffic flows through this layer.
- Step 2: Identify one bounded context to extract. Build the new microservice alongside the monolith.
- Step 3: Route traffic for that specific domain from the gateway to the new service instead of the monolith.
- Step 4: Once the new service is stable and fully handling production traffic, delete the corresponding code from the monolith.
- Step 5: Repeat for the next bounded context.
The beauty of this approach is that at every point in the migration, you have a working system. If the new service has problems, you route traffic back to the monolith in seconds. Your users never know a migration is happening. We typically see teams extract one service every 4 to 6 weeks using this pattern, which means a complex monolith with 8 to 10 bounded contexts takes roughly 10 to 14 months to fully decompose. That sounds slow, but it is dramatically faster (and safer) than a ground-up rewrite that takes 18 months and has a coin-flip chance of success.
One critical rule: never run the strangler fig in reverse. Do not start routing some traffic to the new service and some to the old code for the same domain. That creates data consistency nightmares. Either the monolith owns a capability or the new service does. Use feature flags to control the cutover, but make it a clean switch.
Identifying Service Boundaries with Domain-Driven Design
The hardest part of the migration is not the technology. It is deciding where to draw the lines between services. Get the boundaries wrong and you end up with a distributed monolith: services that cannot be deployed independently because they are constantly calling each other for every operation. That is worse than what you started with.
We use Domain-Driven Design (DDD) to identify bounded contexts, and you should too. Start with these exercises:
Event Storming
Get your engineers and domain experts in a room (or a Miro board) and map out every domain event in your system. "OrderPlaced," "PaymentProcessed," "InventoryReserved," "ShipmentCreated." Cluster related events together. Those clusters are your candidate bounded contexts and, eventually, your microservices.
Data Ownership Analysis
For each database table, ask: which bounded context is the authoritative owner of this data? If two contexts both write to the same table, you have found a boundary that needs careful decomposition. The "Orders" table might be owned by the Order service, but the "Customers" table is owned by the Identity service. When the Order service needs customer data, it holds a reference (customer ID) and queries the Identity service, not the database directly.
Which Service to Extract First
Pick a service that has these properties:
- Low coupling: It does not depend on 15 other modules to function.
- High business value: Extracting it gives you a tangible benefit, such as independent scaling or faster deployments.
- Clear data ownership: It has its own tables that are not heavily joined with the rest of the schema.
In practice, notification services, authentication, and search/indexing are common first extractions because they have clear boundaries and minimal shared state. Avoid extracting your core transaction processing first. That is the most complex, most coupled part of your system, and getting it wrong will shake confidence in the entire migration.
Data Decomposition: The Part Everyone Gets Wrong
Splitting code into separate services is straightforward. Splitting the database is where migrations go off the rails. Your monolith probably has a single database with hundreds of tables, foreign key constraints across domain boundaries, and stored procedures that join data from five different contexts. You cannot just point two services at the same database and call it microservices. That is a recipe for deadlocks, schema migration conflicts, and invisible coupling.
The Database-Per-Service Pattern
Each microservice should own its data store. That means the Order service has its own database (or schema), the Inventory service has its own, and they never reach into each other's tables. This sounds simple, but the execution is tricky. Here is the approach we use:
- Phase 1, Read replica: The new service reads from a replica of the monolith database. This gets you running quickly but does not break the coupling.
- Phase 2, Dual writes: The new service writes to its own database while the monolith continues writing to the old tables. A synchronization job keeps them in sync. This phase is temporary and fragile. Keep it short.
- Phase 3, Cutover: Stop writing to the old tables. The new service's database is now the source of truth. Other services access this data through the service's API, never directly.
For tables that are shared across contexts, you have two options. First, duplicate the data. Each service keeps its own copy and stays in sync through domain events. The Order service publishes an "OrderShipped" event, and the Analytics service consumes it to update its own reporting tables. Second, create a shared data service, but only for truly cross-cutting concerns like audit logs or configuration. Do not let "shared data service" become a backdoor to rebuilding the monolith.
If your database is already under heavy load, you will want to scale your database before starting the decomposition. Trying to split a database that is already falling over adds unnecessary risk to an already complex process.
Handling Transactions Across Services
In a monolith, you wrap related operations in a database transaction. In microservices, you cannot do that across service boundaries. Instead, use the Saga pattern: a sequence of local transactions coordinated by events. If the payment service charges the customer but the inventory service fails to reserve the item, the payment service publishes a compensating event to issue a refund. This is more complex than a database transaction, but it is the only pattern that works reliably across service boundaries without introducing distributed locks.
API Gateway and Service Communication
Once you have more than two services, you need to think carefully about how they talk to each other and how external clients access them. Without a clear communication strategy, you will end up with a tangled web of point-to-point HTTP calls that is impossible to debug.
The API Gateway Layer
An API gateway sits between your clients and your services. It handles routing, authentication, rate limiting, and request transformation. For most teams, we recommend starting with Kong or AWS API Gateway. If you are already on Kubernetes, consider using an ingress controller like Traefik or Ambassador as your gateway.
The gateway should be thin. It routes requests and enforces security policies. It does not contain business logic. If you find yourself writing custom transformation logic in your gateway, that logic belongs in a service.
Synchronous vs. Asynchronous Communication
This is where many teams make a critical mistake. They default to synchronous REST calls for all service-to-service communication. That creates tight coupling, cascading failures, and latency chains. If Service A calls Service B, which calls Service C, a slowdown in C makes A slow too.
Use synchronous communication (REST or gRPC) only when the caller genuinely needs an immediate response. Examples: user authentication, payment validation, real-time data lookups.
Use asynchronous communication (message queues or event streams) for everything else. When an order is placed, publish an "OrderPlaced" event to a message broker (RabbitMQ, Amazon SQS, or Kafka). The inventory service, notification service, and analytics service each consume that event independently. If the notification service is down, the messages queue up and get processed when it recovers. No cascading failures, no tight coupling.
Our Recommended Communication Stack
- Client to service: REST over HTTPS through your API gateway. Use OpenAPI specs for documentation.
- Service to service (sync): gRPC for internal calls. Faster than REST, strongly typed via Protocol Buffers, and supports streaming.
- Service to service (async): Amazon SQS for simple queues, Kafka or Redpanda for event streaming when you need replay and ordering guarantees.
Testing and Deployment Strategies
Testing a microservices system is fundamentally different from testing a monolith. In a monolith, you run your test suite and you know whether the system works. In a microservices architecture, each service might pass its own tests while the system as a whole is broken because of an incompatible API change between services.
The Testing Pyramid for Microservices
- Unit tests: Same as always. Test your business logic in isolation. Fast, cheap, essential.
- Integration tests: Test each service against its real dependencies (database, message broker) using Docker Compose or Testcontainers. These catch issues that mocks hide.
- Contract tests: This is the critical addition. Use Pact or a similar tool to verify that Service A's expectations about Service B's API match what Service B actually provides. Run contract tests in CI for every service. If a provider changes its API in a way that breaks a consumer, the build fails before anything reaches production.
- End-to-end tests: Run a small suite of critical user journeys against a staging environment with all services deployed. Keep this suite small (10 to 20 tests) because E2E tests in a microservices system are slow and flaky. They exist to catch integration issues, not to test business logic.
Deployment: One Service at a Time
The whole point of microservices is independent deployability. Each service should have its own CI/CD pipeline, its own Docker image, and its own deployment configuration. We typically use GitHub Actions or GitLab CI for the pipeline, Docker for containerization, and either Kubernetes (EKS, GKE) or a managed platform (AWS ECS, Railway) for orchestration. If you are evaluating orchestration options, our comparison of Kubernetes vs. serverless can help you decide.
For the migration period specifically, deploy new services using blue-green or canary deployments. Route 5% of traffic to the new service, monitor error rates and latency for 30 minutes, then gradually increase to 100%. If anything looks wrong, roll back instantly. Tools like Argo Rollouts (for Kubernetes) or AWS CodeDeploy make this straightforward.
CI/CD During the Migration
During the transition, you will have a hybrid system: the monolith plus some extracted services. Set up your CI/CD so that changes to the monolith trigger the monolith pipeline, and changes to each service trigger that service's pipeline. A mono-repo with path-based triggers works well here. If you have not set up CI/CD yet, our guide on how to set up CI/CD covers the fundamentals.
Observability: You Cannot Debug What You Cannot See
In a monolith, when something breaks, you check one set of logs and one set of metrics. In a microservices architecture, a single user request might touch five services, three databases, and two message queues. Without proper observability, debugging a production issue becomes a scavenger hunt.
The Three Pillars
Distributed tracing is non-negotiable. Every request gets a unique trace ID that follows it across service boundaries. When a request fails or is slow, you pull up the trace and see exactly which service is the bottleneck. Use OpenTelemetry (the industry standard) to instrument your services, and send traces to Jaeger, Grafana Tempo, or Datadog.
Centralized logging aggregates logs from all services into a single searchable system. Each log entry includes the trace ID so you can correlate logs with traces. Use the ELK stack (Elasticsearch, Logstash, Kibana), Grafana Loki, or a managed service like Datadog Logs. Structured JSON logging is essential. Do not rely on plain text log lines that you have to parse with regex.
Metrics and alerting track the health of each service and the system as a whole. At minimum, monitor the four golden signals for every service: latency (p50, p95, p99), traffic (requests per second), errors (error rate as a percentage), and saturation (CPU, memory, connection pool usage). Use Prometheus with Grafana dashboards, or Datadog if you want a managed solution.
Health Checks and Circuit Breakers
Every service needs a health check endpoint that your orchestrator and load balancer can poll. If a service is unhealthy, traffic gets routed to healthy instances automatically. Combine this with circuit breakers (using a library like Resilience4j for Java or Polly for .NET) to prevent cascading failures. When a downstream service is failing, the circuit breaker opens, returns a fallback response immediately, and periodically tests whether the downstream service has recovered.
Set up these observability tools before you extract your first service, not after. Migrating without observability is like performing surgery in the dark. You need to see what is happening in real time to make confident decisions about routing traffic from the monolith to new services.
Your Migration Roadmap: Putting It All Together
Here is the timeline we use when planning a monolith-to-microservices migration for our clients. Adjust the durations based on your team size and system complexity, but do not skip any of these phases.
Phase 1: Foundation (Weeks 1 to 4)
- Deploy an API gateway in front of your monolith. All external traffic routes through it.
- Set up distributed tracing, centralized logging, and metrics dashboards.
- Containerize your monolith with Docker if it is not already containerized.
- Run event storming sessions to identify bounded contexts and service boundaries.
Phase 2: First Extraction (Weeks 5 to 10)
- Extract one low-risk, low-coupling service (notifications, search, or authentication).
- Set up that service's CI/CD pipeline, database, and monitoring.
- Use the strangler fig pattern to gradually route traffic from the monolith to the new service.
- Validate with contract tests and canary deployments.
Phase 3: Core Extractions (Weeks 11 to 30+)
- Extract additional services one at a time, prioritized by business value and coupling.
- Implement async communication (event bus) for services that do not need synchronous responses.
- Decompose the shared database into per-service databases, following the phased approach described earlier.
- Decommission old monolith code as each service reaches full production stability.
Phase 4: Optimization (Ongoing)
- Tune auto-scaling policies for each service based on real production traffic patterns.
- Optimize inter-service communication (switch chatty REST calls to gRPC or batch endpoints).
- Build internal developer platform tooling (service templates, shared libraries, deployment automation).
The biggest mistake we see teams make is treating this as a purely technical project. It is an organizational change as much as a technical one. Each service needs a clear owner, a clear API contract, and a clear SLA. Without that ownership model, you end up with services that nobody maintains and everyone blames.
If you are planning a migration and want a team that has done this dozens of times, we would love to help you build the roadmap and execute on it. Book a free strategy call and we will walk through your architecture, identify the right service boundaries, and give you an honest assessment of whether microservices are the right move for your system.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.