Terraform Across Azure, AWS, and OCI: Hard Lessons
I manage Terraform infrastructure across three cloud providers: Azure, AWS, and Oracle Cloud Infrastructure. Each has its own provider quirks, and the interaction between them creates failure modes that no single-cloud tutorial covers. Here are the lessons that cost me production incidents.
Provider Version Pinning: The ~> Operator Saved Us
Our first major incident: a terraform apply destroyed a production AKS cluster because we used >= version constraints. A minor Azure provider update changed a default value, and Terraform helpfully "corrected" our cluster to match.
# DANGEROUS - allows any future version
terraform {
required_providers {
azurerm = { source = "hashicorp/azurerm", version = ">= 3.0" }
}
}
# SAFE - allows only patch updates
terraform {
required_providers {
azurerm = { source = "hashicorp/azurerm", version = "~> 3.85.0" }
}
}
The ~> operator allows only the rightmost version component to increment. ~> 3.85.0 allows 3.85.1 through 3.85.x but blocks 3.86.0. Update providers deliberately through tested PRs, never automatically.
State File Splitting: One State Per Service Boundary
A single state file for "all of production" is a ticking time bomb. When networking and compute share a state, a VNet change requires a plan that touches every VM, load balancer, and database. The blast radius is everything.
Split state files by service boundary:
networking/— VNets, subnets, NSGs, peeringcompute/— VMs, AKS clusters, scale setsdata/— Databases, storage accounts, cachesidentity/— Service principals, managed identities, RBAC
Each state file gets its own backend (Azure Storage container, S3 bucket) with state locking. A networking change can't accidentally destroy your databases because they're in completely separate state files.
The remote_state Anti-Pattern
The terraform_remote_state data source lets one configuration read outputs from another. It sounds like the right way to share data between state files. It's actually a tight coupling antipattern.
The problem: if someone renames an output in the networking state, every configuration that reads it via remote_state breaks. You've created a hidden dependency that Terraform can't track.
Better alternatives: write shared values to Azure Key Vault, AWS SSM Parameter Store, or use data sources to look up resources directly. These are more resilient and don't couple state file schemas.
OCI Provider: The Compartment Confusion
Oracle Cloud's resource hierarchy uses compartments. The most common confusion: the root compartment ID is the same as the tenancy OCID. When a Terraform resource requires compartment_id, you can use the tenancy OCID for root-level resources. But sub-compartments have their own OCIDs.
# Root compartment = tenancy OCID
variable "tenancy_ocid" { default = "ocid1.tenancy.oc1..aaaaaa..." }
# This works for root-level resources
resource "oci_core_vcn" "main" {
compartment_id = var.tenancy_ocid # Root compartment
cidr_block = "10.0.0.0/16"
}
# Sub-compartment needs its own OCID
resource "oci_core_subnet" "app" {
compartment_id = oci_identity_compartment.app.id # NOT tenancy_ocid
}
Azure Auth: The Right Way
Azure provider authentication has three paths, and choosing wrong creates security risks:
- Managed Identity (best for CI/CD): No secrets to manage. AKS pods, Azure VMs, and GitHub Actions OIDC all support it
- Service Principal with OIDC (good for automation): No client secret in state files. Uses federated credentials
az login(local development only): Never use in automation — tokens expire
Never use client_secret in Terraform state. Terraform stores provider configuration in state files. If your state is in Azure Storage, the client secret is now readable by anyone with storage access. Use OIDC federation instead.
Plan in CI: The Exit Code Trick
# In your CI pipeline
terraform plan -detailed-exitcode -out=tfplan
# Exit 0 = no changes, Exit 1 = error, Exit 2 = changes detected
if [ $? -eq 2 ]; then
# Post plan output as PR comment for review
terraform show -no-color tfplan | gh pr comment --body-file -
fi
The -detailed-exitcode flag is crucial. Without it, terraform plan always exits 0 on success, whether or not changes are detected. Exit code 2 means "changes pending" — use this to gate approvals and post plan diffs to PRs.
About the author: Ilir Ivezaj manages Terraform infrastructure across Azure, AWS, and Oracle Cloud for enterprise and startup environments. He's a technology executive based in Michigan. Get in touch.