When a Cloud Migration Makes Sense (and When It Doesn't)
Cloud migrations are expensive, risky, and time-consuming. Before spending $20K to $120K migrating your infrastructure, you need a compelling reason. Cost is the most common driver: a startup that grew organically on AWS might find GCP's committed use discounts save 30 to 40% at scale, or Azure's enterprise agreements unlock pricing that AWS cannot match. Specific service capabilities matter too: if your AI workload needs Google's TPU infrastructure or your .NET app integrates deeply with Azure Active Directory, staying on a platform that lacks those capabilities makes no sense.
Geographic expansion is another legitimate trigger. If you are entering Southeast Asia and your current provider has weak regional presence there, a migration (or at least a hybrid approach) may be necessary for latency and data residency requirements.
Where migration does not make sense: if your motivation is curiosity, mild frustration with your current provider, or chasing a marginally better price on one service type. Moving a production application with a real user base carries non-trivial risk. If you spend $5K/month on AWS and might save $800/month on GCP, the migration will take 6 to 12 months to break even at minimum. Run the numbers honestly before you commit.
The clearest signal that migration is worth it: you are locked into a specific proprietary service that creates real pain (vendor lock-in on a managed database with no portability, for example), or your cost savings are projected at 25% or more annually and the migration cost is recoverable within 12 months.
Phase 1: Inventory and Service Mapping
You cannot migrate what you cannot see. Start with a complete inventory of every resource in your current cloud account. For AWS, run AWS Config or use the CLI to list all running services by region. For GCP, use the Asset Inventory API. Most teams are surprised by what they find: forgotten Lambda functions, RDS instances that were supposed to be temporary, ECS clusters from deprecated features, and dozens of S3 buckets with unclear ownership.
Building Your Service Map
For each resource, document: the service type, its dependencies (what calls it, what it calls), data volume, traffic patterns, and criticality. A spreadsheet works fine. You need to know: is this on the critical path to production? Does it have persistent state? How long can it be unavailable before customers notice?
The next step is mapping your current services to equivalents on the destination provider. The conceptual equivalents are straightforward for compute and storage, but proprietary managed services are where friction appears. Here are the common mappings:
- Compute: AWS EC2 maps to GCP Compute Engine or Azure VMs. AWS ECS/Fargate maps to GCP Cloud Run or Azure Container Apps.
- Object storage: AWS S3 maps to GCP Cloud Storage or Azure Blob Storage. APIs differ but the S3 SDKs have drop-in replacements on both platforms.
- Managed databases: AWS RDS (Postgres) maps to GCP Cloud SQL or Azure Database for PostgreSQL. The databases themselves are identical; the management layer differs.
- Message queues: AWS SQS maps to GCP Pub/Sub or Azure Service Bus. Message semantics differ slightly; test thoroughly.
- Serverless functions: AWS Lambda maps to GCP Cloud Functions or Azure Functions. Runtime support varies by language version.
- DNS and CDN: AWS Route 53 maps to GCP Cloud DNS or Azure DNS. AWS CloudFront maps to GCP Cloud CDN or Azure CDN.
The services with no direct equivalent are the ones that take the most work. AWS Cognito has no clean GCP analog; you will need to evaluate Auth0, Supabase Auth, or build your own JWT layer. AWS Step Functions have rough equivalents in GCP Workflows, but the configuration is not portable. Identify these gaps early because they drive the majority of your engineering effort.
Phase 2: Database Migration Strategies
Databases are where migrations succeed or fail. Unlike stateless application code, a database migration carries the risk of data loss or corruption if not done carefully. You have three main approaches, and the right one depends on your downtime tolerance and database size.
Dump and Restore (Simplest, Requires Downtime)
Take a logical dump (pg_dump for PostgreSQL, mysqldump for MySQL), transfer the file to your new environment, and restore it. This is the right approach for small databases (under 50GB) where a maintenance window of 30 to 60 minutes is acceptable. The steps are: put the application in maintenance mode, take the final dump, restore to the new database, update connection strings, bring the application back online.
For a 20GB PostgreSQL database, pg_dump typically takes 5 to 10 minutes, transfer takes another 5 minutes, and pg_restore takes 15 to 30 minutes depending on indexes and constraints. Total downtime: 30 to 45 minutes. Acceptable for many B2B SaaS products during an off-peak window.
Continuous Replication (Zero Downtime)
For production databases where any downtime is unacceptable, continuous replication is the answer. Set up logical replication from your source database to the destination. Let it sync the initial dataset (this takes hours to days for large databases), then keep the replica current via change data capture (CDC). When you are ready to cut over, you stop writes to the source, let replication catch up (typically seconds to a few minutes of lag), update your connection strings, and resume writes on the destination.
For PostgreSQL, pglogical or the built-in logical replication works well for same-version migrations. For cross-version migrations, use pgloader or a tool like Debezium with Kafka to stream changes. MySQL has built-in binlog replication. MongoDB uses replica sets natively.
Managed Migration Services
AWS Database Migration Service (DMS) handles both full-load and CDC migrations from dozens of source databases to dozens of targets, including cross-cloud scenarios. It costs roughly $0.18/hour per replication instance plus data transfer fees. A migration running for 2 weeks for a 200GB database costs roughly $60 to $100 in DMS fees, which is trivial compared to the engineering hours of managing replication manually. GCP Database Migration Service (DMS) offers similar functionality and is worth evaluating if you are migrating to GCP.
Whichever approach you choose, validate the migrated data before cutting over. Row counts are a minimum check. Better: compute checksums on critical tables and compare source versus destination. For financial data or anything with auditability requirements, a full row-by-row comparison on a sample of records is worth the time.
Phase 3: Application Migration and Portability
If your application is containerized, the migration is significantly simpler. Docker containers run identically on any cloud provider's container platform. If you are on AWS ECS, migrating to GCP Cloud Run or GKE is largely a matter of rewriting your deployment configuration, not your application code. This is the most compelling argument for containerizing applications even when you have no immediate plan to migrate.
Infrastructure as Code
If your infrastructure is defined in Terraform, migration is a controlled process. Terraform supports all three major cloud providers through provider plugins. You write new Terraform modules for the destination cloud, provision the infrastructure in parallel with your existing setup, test it, and then decommission the old infrastructure after cutover. The investment in Terraform pays off most visibly here: teams without IaC spend weeks manually recreating infrastructure and frequently miss configuration details.
If you are on AWS-specific tools like CloudFormation or CDK, you will need to rewrite your infrastructure definitions in Terraform or in the destination provider's native IaC tool. Budget 1 to 2 weeks of engineering time for a moderately complex infrastructure (10 to 20 services). This is not glamorous work but it is where configuration bugs and security mismatches get discovered.
SDK and API Replacement
Cloud provider SDKs are not portable. Code that calls aws-sdk to write to S3 needs to be updated to use the GCP or Azure storage SDK when migrating. The scope of this work depends on how much proprietary SDK usage is embedded in your application code.
The cleanest architecture uses thin abstraction layers: a StorageService class that wraps the underlying provider SDK, a QueueService class that wraps SQS or Pub/Sub. If you have these abstractions, swapping providers means updating the implementation behind the interface, not hunting through thousands of lines of business logic for SDK calls. If you do not have these abstractions, now is the time to add them before migrating. The refactor to add abstraction layers typically takes 1 to 3 days and makes the actual migration 5 to 10 times faster.
Environment Configuration
Audit every environment variable and secret. AWS-specific values like ARNs, region identifiers, and service endpoints need to be replaced with their destination equivalents. Use your migration as an opportunity to consolidate secrets into a provider-agnostic secrets manager like HashiCorp Vault or a simple parameter store pattern that works across providers.
Phase 4: Parallel Environments and Validation
Never cut over to a new cloud provider without running both environments in parallel first. The parallel period is your safety net, and it is the phase most teams try to skip in the interest of moving faster. Skipping it leads to production incidents.
What to Run in Parallel
The minimum parallel environment includes: the full application stack on the new provider, the migrated database (kept in sync via replication), and monitoring configured identically on both sides. Send real production traffic to both environments using weighted DNS routing (e.g., 10% to new, 90% to old) and compare response times, error rates, and business metrics.
Run parallel environments for at least 1 week for non-critical applications, and 2 to 4 weeks for anything handling payments, healthcare data, or other sensitive workloads. The time cost is real but it is far less expensive than a failed cutover.
Load Testing
Before routing any production traffic to the new environment, run a load test that simulates your peak traffic. Tools like k6, Artillery, or Locust can generate realistic traffic patterns. Target at least 150% of your historical peak load to ensure the new infrastructure has adequate headroom. Pay attention to p95 and p99 latency, not just average response time. A migration that improves average latency but increases p99 from 200ms to 2s is a regression for real users.
Data Consistency Checks
If you are using continuous replication, monitor replication lag throughout the parallel period. Spikes in lag indicate the replication connection is struggling with write volume. A replication lag of more than 60 seconds during peak traffic is a warning sign that your cutover window needs to be longer to allow the replica to fully catch up before you switch writes.
Phase 5: DNS Cutover and Rollback Planning
The actual cutover is the highest-risk moment of any migration. The goal is to make it as boring as possible through preparation. Every decision about the cutover sequence should be made in advance and documented in a runbook that anyone on your team can follow under pressure.
DNS TTL Preparation
At least 48 hours before your planned cutover, reduce your DNS TTL to 60 seconds. This means DNS resolvers will refresh your records within 1 minute of a change rather than waiting for the original TTL (often 3600 seconds or more). When you update the DNS record to point to your new environment, propagation will be near-instant for most users. After the migration is stable for a week, restore TTLs to their normal values.
The Cutover Sequence
A standard zero-downtime cutover sequence looks like this: first, verify replication lag is under 10 seconds. Second, update DNS to point to the new environment. Third, monitor error rates and latency in real time on the new environment. Fourth, after 15 minutes with no anomalies, confirm the cutover is complete. Fifth, keep the old environment running for 24 to 48 hours as a fallback. Sixth, decommission the old environment only after you are fully confident in the new one.
Rollback Plan
A rollback plan is not optional. Define the exact conditions that trigger a rollback: error rate above X%, p95 latency above Y ms, any payment processing failure, etc. Define the exact steps to roll back: update DNS back to the old environment, redirect writes back to the source database, notify the team. Practice the rollback in a staging environment before the production cutover so everyone knows what to do if it is needed. With a 60-second DNS TTL and continuous replication still running, a rollback takes 2 to 5 minutes. Without these preparations, it can take 30 to 60 minutes, during which your application may be down.
Realistic Costs and Timelines
Cloud migrations are almost always more expensive and slower than initial estimates. Understanding the real cost drivers helps you set expectations and budget accurately.
Cost Ranges by Application Complexity
- Small web app (1 to 5 services, under 50GB database, containerized): $10,000 to $25,000. 4 to 8 weeks total. This is a straightforward migration with 1 to 2 engineers. Service mapping takes a day, Terraform rewrite takes a week, parallel testing takes 2 weeks, cutover is uneventful.
- Mid-size SaaS (10 to 20 services, 100 to 500GB database, some proprietary SDK usage): $30,000 to $60,000. 8 to 16 weeks total. The complexity comes from replacing proprietary services (Cognito, Step Functions, SES) and managing SDK abstraction layers across a larger codebase.
- Complex platform (30 or more services, multi-region, 1TB or more database, deep vendor integration): $80,000 to $120,000 or more. 4 to 9 months total. These migrations require a dedicated team and often involve phased cutover service by service rather than a single big-bang migration.
Hidden Cost Drivers
Data transfer fees are the most commonly underestimated cost. Moving 10TB of data out of AWS costs roughly $900 in egress fees. Moving 100TB costs $9,000. This is a one-time cost, but it adds up fast for data-heavy applications. Factor this into your ROI calculation.
The parallel operation period has real costs too. Running two full environments in parallel for 4 weeks at $10K/month each means an extra $10,000 in cloud spend during the migration window. For teams on tight budgets, scaling down the old environment to minimal capacity during the parallel period can reduce this.
Engineering Time Is the Biggest Variable
At a fully-loaded cost of $150 to $250/hour for senior engineers, a 400-hour migration project costs $60,000 to $100,000 in engineering time alone, before any cloud costs or tooling. The biggest lever on migration cost is how containerized and infrastructure-as-code-based your application is before you start. Teams that invest in containerization and Terraform before a migration spend 30 to 50% less on the migration itself.
We help companies plan and execute cloud migrations with zero-downtime cutover strategies and post-migration cost optimization. If you are weighing a cloud provider switch and want a realistic assessment of what it will take, Book a free strategy call and we will walk through your architecture and give you honest numbers.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.