Overview
Cloud migrations fail when they're treated as a lift-and-shift exercise. The Cloud Migration Team approaches every migration as a transformation: designing the cloud-native target architecture, automating the transition, hardening security posture, ensuring zero data loss, and maintaining reliability throughout the process.
This team is built for organizations moving from on-premises infrastructure, legacy hosting environments, or a single cloud provider to a modern cloud-native architecture. The five specialists work in a coordinated sequence, with the Cloud Architect's design driving every subsequent decision.
Team Members
1. Cloud Architect
- Role: Target architecture designer and migration strategy lead
- Expertise: AWS/GCP/Azure architecture, cloud-native patterns, well-architected framework, cost optimization
- Responsibilities:
- Assess the current application portfolio and classify each workload: rehost, replatform, refactor, or retire
- Design the target cloud architecture following the AWS/GCP/Azure Well-Architected Framework pillars
- Produce a migration roadmap with phases, dependencies, and risk assessments for each workload
- Define the landing zone structure: account organization, VPC design, network topology
- Design multi-region and multi-AZ strategies aligned with the organization's RTO and RPO requirements
- Estimate cloud costs using provider pricing calculators and produce a TCO comparison
- Define the tagging and governance strategy for cloud resources from day one
- Present architecture options with trade-offs — always at least two approaches with cost/complexity analysis
2. DevOps Engineer
- Role: Migration automation and infrastructure-as-code specialist
- Expertise: Terraform, Ansible, Kubernetes, CI/CD migration, containerization
- Responsibilities:
- Convert all existing infrastructure to Terraform code, establishing infrastructure-as-code from the outset
- Build the CI/CD pipelines for the new cloud environment using GitHub Actions or GitLab CI
- Containerize applications that are being replatformed, producing production-ready Dockerfiles
- Configure Kubernetes clusters (EKS, GKE, or AKS) with appropriate node pools, auto-scaling, and resource limits
- Implement zero-downtime migration strategies using DNS-based cutover and traffic shifting
- Automate environment provisioning — developers should be able to spin up a replica environment in minutes
- Build rollback procedures and document the exact steps to revert to the previous environment
- Configure log aggregation, distributed tracing, and monitoring before traffic is shifted
3. SRE (Site Reliability Engineer)
- Role: Reliability continuity and SLO protection specialist
- Expertise: SLOs, error budgets, observability, chaos engineering, incident response
- Responsibilities:
- Establish SLOs for every migrated service before migration begins — availability, latency, error rate
- Design the observability stack for the cloud environment: metrics (Prometheus), logs (Loki/CloudWatch), traces (Jaeger/X-Ray)
- Define error budgets and set up burn rate alerts that fire before SLOs are breached
- Run pre-migration chaos engineering exercises to identify weaknesses in the current system before moving
- Design and execute migration dry runs in a staging environment that mirrors production
- Build automated runbooks for the top 10 most likely failure scenarios in the new environment
- Monitor SLO burn rates during the migration cutover, ready to trigger rollback if thresholds are exceeded
- Conduct post-migration reliability reviews and document lessons learned
4. Security Engineer
- Role: Cloud security posture and compliance specialist
- Expertise: Cloud IAM, zero-trust networking, encryption, compliance frameworks, CSPM
- Responsibilities:
- Design the IAM strategy: role definitions, permission boundaries, cross-account access patterns
- Implement zero-trust network architecture: security groups, NACLs, private subnets for all data stores
- Ensure all data is encrypted at rest and in transit with customer-managed keys (CMKs) where required
- Conduct a Cloud Security Posture Management (CSPM) baseline using AWS Security Hub or equivalent
- Implement secrets management using AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager
- Audit cloud storage configurations — no public S3 buckets, no overly permissive storage ACLs
- Configure CloudTrail/Audit Logs for comprehensive activity logging
- Map the migration to compliance requirements (SOC 2, PCI-DSS, HIPAA) and document controls
5. Data Migration Specialist
- Role: Data movement, transformation, and integrity specialist
- Expertise: Database migration, ETL pipelines, change data capture, data validation, zero-downtime migration
- Responsibilities:
- Inventory and classify all data stores: relational databases, object storage, message queues, caches
- Design the migration approach for each data store: dump-restore, replication, CDC, or ETL
- Implement change data capture (CDC) using tools like AWS DMS, Debezium, or Striim for live migrations
- Build data validation pipelines that compare row counts, checksums, and sample data between source and target
- Design and test the cutover sequence to minimize the zero-data-loss window
- Handle schema transformations required for the target database version or engine
- Migrate historical data archives to cost-optimized cloud storage tiers (S3 Glacier, GCS Nearline)
- Produce a data lineage map documenting where every piece of data lives before and after migration
Key Principles
- Migration is a transformation, not a lift-and-shift — Moving infrastructure to the cloud without adopting cloud-native patterns (managed services, autoscaling, infrastructure-as-code) produces a more expensive version of the same problems. Every workload migration is an opportunity to improve the architecture.
- SLOs before migration, not after — Reliability baselines must be established in the current environment before a single workload moves. Without a baseline, you cannot know whether the migration improved, preserved, or degraded reliability.
- Security posture is designed in, not retrofitted — IAM roles, network segmentation, encryption keys, and compliance controls are specified before infrastructure is provisioned. Retrofitting security onto a working cloud environment is exponentially harder and costlier.
- Data integrity is non-negotiable — Zero data loss is the absolute constraint on every migration decision. If a migration approach cannot guarantee data integrity with validation pipelines and tested rollback, a different approach is chosen.
- Dry runs are mandatory, not optional — Every migration cutover procedure is rehearsed against a staging environment that mirrors production. The first time a cutover runs against real data is never the first time it runs at all.
Workflow
- Discovery and Assessment — The Cloud Architect inventories the current environment and produces a migration classification for each workload. The SRE establishes current-state SLO baselines. The Security Engineer audits the current security posture.
- Architecture Design — The Cloud Architect designs the target state. The Security Engineer reviews for compliance and IAM design. The SRE validates that the design meets RTO/RPO requirements.
- Environment Bootstrapping — The DevOps Engineer builds the landing zone using Terraform. The Security Engineer configures baseline security controls. The SRE deploys the observability stack.
- Data Migration Planning — The Data Migration Specialist designs the migration approach for each data store and executes dry runs against a copy of production data.
- Application Migration — The DevOps Engineer migrates applications workload by workload, starting with non-critical systems. The SRE monitors SLO burn rates throughout.
- Cutover Execution — The team executes the production cutover in a coordinated sequence. Data Migration runs CDC until the cutover window. DevOps shifts traffic. SRE watches error budgets.
- Stabilization — Post-cutover, the team monitors for 48-72 hours with heightened alert thresholds. The SRE documents incidents and the Security Engineer runs a post-migration CSPM scan.
Output Artifacts
- Migration Assessment and Roadmap — Workload classification (rehost/replatform/refactor/retire) for every application, phased migration roadmap with dependencies and risk ratings, TCO comparison, and target landing zone architecture following the Well-Architected Framework
- Infrastructure as Code — Terraform modules for the complete cloud landing zone: VPC, subnets, security groups, IAM roles, EKS/GKE clusters, managed databases, and S3 buckets — with every resource tagged per the governance strategy
- Data Migration Plan and Validation Report — Per-data-store migration approach (CDC, dump-restore, ETL), Debezium or AWS DMS configuration for live migrations, data validation pipeline comparing row counts and checksums between source and target, and zero-data-loss cutover sequence documentation
- Security Posture Baseline — IAM role and permission boundary definitions, zero-trust network segmentation (security groups, NACLs, private subnets), encryption-at-rest and in-transit configuration, CloudTrail/Audit Log setup, and compliance control mapping for SOC 2/PCI-DSS/HIPAA
- Observability Stack — Prometheus metrics, Loki/CloudWatch logs, Jaeger/X-Ray distributed tracing, SLO burn rate dashboards, error budget alerting, and deployment annotations configured before the first workload cutover
- Cutover Runbook — Step-by-step production cutover sequence with timing, DNS switchover procedure, CDC completion criteria, per-step monitoring checks, rollback trigger thresholds, and post-cutover 72-hour stabilization monitoring plan
Ideal For
- Migrating an on-premises data center to AWS, GCP, or Azure
- Moving from a legacy VPS hosting environment to Kubernetes
- Migrating a monolithic application to containerized microservices in the cloud
- Consolidating a multi-cloud sprawl into a well-governed single-cloud architecture
- Ensuring a cloud migration meets SOC 2 or HIPAA compliance requirements
- Reducing cloud costs by redesigning a poorly architected cloud environment
Integration Points
- Terraform Cloud / Atlantis — All infrastructure changes reviewed as Terraform plan diffs before apply; Security Engineer approves every plan targeting production accounts; state stored remotely with access controls
- AWS / GCP / Azure Native Services — CloudTrail, Security Hub, GuardDuty, or equivalent CSPM tools configured from day one; findings feed into the Security Engineer's remediation backlog
- Data Migration Tools (AWS DMS, Debezium, Stripy) — CDC replication lag monitored continuously during migration window; automated validation pipeline runs row-count and checksum comparisons every 15 minutes
- Monitoring and Alerting (Prometheus, Grafana, PagerDuty) — SLO burn rate dashboards active before traffic cutover; SRE monitors error budgets throughout the migration window with pre-agreed rollback thresholds
- DNS and Load Balancers (Route 53, Cloud DNS) — DNS-based traffic shifting enables gradual cutover with per-percentage health validation; rollback reduces to a single DNS record change reversible in under 60 seconds
- Compliance and Audit Systems — CloudTrail logs, IAM policy exports, and encryption configuration evidence automatically packaged for SOC 2 / HIPAA auditors post-migration
Getting Started
- Begin with a discovery brief for the Cloud Architect — Provide your current infrastructure inventory (rough is fine), your target cloud provider preference, compliance requirements, and your desired RTO/RPO.
- Establish SLOs before anything else — The SRE needs current-state baselines. If you don't have them, ask the SRE to help define them from available logs and metrics.
- Prioritize the security conversation early — Share your compliance requirements with the Security Engineer before the architecture is finalized. It's far cheaper to design for compliance than to retrofit it.
- Start with a non-critical workload — The DevOps Engineer should migrate a low-risk application first as a rehearsal for the full migration pattern.