IaC Fundamentals
Infrastructure as Code = treat infrastructure the same way you treat application code: text files in git, code review, CI/CD, tests, rollback via revert. Every cloud resource, every VM, every DNS record, every firewall rule — declared in a file, applied by a tool. The console is for reading; the code is the source of truth.
Why IaC wins
Before IaC, infra lived in people’s memory, wikis, and the console. Consequences:
- No history. “Why is this rule open?” — nobody knows; it’s been like that for years.
- No repro. Dev, staging, prod diverge silently. Bugs only happen in prod.
- No rollback. A bad change = a long ticket.
- Snowflakes. Every server is unique. Every server is fragile. Nobody wants to replace one.
IaC fixes all four:
- History →
git logon the infra repo shows every change, author, reason. - Repro → the same code builds dev and prod. Differences are explicit.
- Rollback →
git revert+terraform apply. Minutes, not days. - Cattle, not pets → any server is disposable; recreate it from code.
The two approaches — declarative vs imperative
The central split. See Declarative vs Imperative Automation for the deep version.
Declarative — “what”
You describe the desired state; the tool figures out how to get there.
# Terraform
resource "aws_instance" "web" {
ami = "ami-0abcdef"
instance_type = "t3.small"
tags = { Name = "web-01" }
}Run it once → instance is created. Run it again → nothing happens (state matches). Change t3.small to t3.medium → tool computes the diff, applies only that. This is the idempotence (Idempotence) property that makes declarative IaC safe.
Examples: Terraform / OpenTofu, Pulumi, CloudFormation, Bicep, Kubernetes YAML, Ansible (with well-written modules).
Imperative — “how”
You describe the steps; the tool runs them.
aws ec2 run-instances --image-id ami-0abcdef --instance-type t3.smallRun twice → two instances. The user is responsible for the “is it already done?” check. Brittle, but fine for one-off operational tasks (restart this service, drain this node) that aren’t trying to model steady-state.
Examples: plain shell scripts, aws CLI, kubectl create, one-off Ansible command: tasks.
Rule of thumb: declarative for desired-state infra; imperative only for operational actions where “run twice” has a meaningful second effect (e.g. “restart the database”).
The IaC tool landscape
Two big layers: provisioning (create cloud resources) and config management (set up what’s inside VMs). Modern stacks often use one tool per layer.
Provisioning tools
| Tool | Language | Scope | Notes |
|---|---|---|---|
| Terraform | HCL | Multi-cloud, multi-provider | The de-facto standard. HashiCorp changed license → community fork OpenTofu (drop-in compatible) |
| OpenTofu | HCL | Multi-cloud | MPL-licensed fork of Terraform; functionally equivalent |
| Pulumi | TypeScript / Python / Go / C# | Multi-cloud | ”IaC in a real programming language” |
| AWS CloudFormation | YAML / JSON | AWS only | First-party, deep AWS integration |
| Azure Bicep | Bicep DSL | Azure only | Nicer front-end to ARM templates |
| Google Deployment Manager | YAML | GCP | Largely superseded by Terraform + Config Controller |
| Crossplane | Kubernetes CRDs | Multi-cloud | IaC through Kubernetes; GitOps-native |
| CDK (AWS) / CDKTF | TypeScript/Python → CFN/Terraform | Multi-cloud (via TF) | Generates templates from code |
If you’re starting today and cloud-agnostic: Terraform or OpenTofu. If single-cloud and all-in: cloud-native (CloudFormation / Bicep) is also fine.
Config management tools
| Tool | Model | Notes |
|---|---|---|
| Ansible | Agentless, SSH / WinRM; YAML playbooks | Easiest to start; see Ansible Fundamentals |
| Puppet | Agent + master; declarative DSL | Long history, enterprise, declining |
| Chef | Agent + master; Ruby DSL | Acquired by Progress; declining |
| Salt | Agent or agentless; YAML + Python | Event-driven, fast at scale |
| cloud-init | First-boot script on cloud VMs | Pair with Terraform for “bootstrap + hand off” |
| Packer | Builds golden images | Reduces config mgmt footprint at runtime |
Modern preference: bake images with Packer + minimal cloud-init, then run as immutable containers / VMs. Ansible fills the gaps.
Terraform in one screen
The one tool you most need to know in 2026 IaC.
┌──────────────────────────────────────────────────────┐
│ │
│ .tf files ───► terraform plan ───► diff │
│ │ │
│ │ terraform apply ───► cloud API │
│ │ │ │
│ │ ▼ │
│ │ remote state │
│ │ (S3 + DynamoDB lock, │
│ ▼ or Terraform Cloud) │
│ git repo │
│ │
└──────────────────────────────────────────────────────┘
The core loop
terraform init # download providers + configure backend
terraform fmt # canonical formatting
terraform validate # syntax/schema check
terraform plan # show what will change
terraform apply # do it (prompts for confirmation)
terraform destroy # tear it downKey concepts
- Resource — a cloud object (
aws_instance,azurerm_virtual_network). Declared in.tf. - Provider — plugin that talks to a platform (aws, azurerm, google, kubernetes, cloudflare, github, datadog, 500+ more).
- Data source — read an existing resource without managing it (
data "aws_vpc" "default" { default = true }). - Variable — input (
variable "region" { default = "us-east-1" }). - Output — export a value to other modules or consumers.
- Module — reusable bundle of resources, like a function. Can be local or pulled from the registry.
- State — JSON file mapping declared resources to real-world IDs. Critical and sensitive.
- Backend — where state lives (local, S3+DynamoDB, GCS, Azure Storage, Terraform Cloud). Prefer remote with locking.
- Workspace — multiple state files from the same config (rarely the right abstraction for env separation — prefer separate configs).
A minimal module use
# main.tf
terraform {
required_version = ">= 1.6"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
backend "s3" {
bucket = "acme-tfstate-prod"
key = "networking/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "tfstate-locks"
encrypt = true
}
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.0"
name = "prod"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
}State is sacred
The state file maps declared resources to the real ones. It has:
- Secrets (DB passwords, private keys that were ever
terraform apply-set). - The single source of truth for “who owns this resource.”
- Metadata the cloud API doesn’t persist.
Rules:
- Store remotely — S3 + DynamoDB lock, GCS, Terraform Cloud. Never local for shared projects.
- Encrypt at rest.
- Restrict access — state bucket policy limits who can read it.
- Lock it — concurrent
applyon the same state corrupts it. DynamoDB lock / TFC lock handles this. - Never edit by hand — use
terraform state rm,terraform import,terraform state mv. - Back it up. Versioned S3 = free state history.
If state is lost: every resource appears “new,” and re-applying would create duplicates. Painful to recover via terraform import.
The testing story
IaC testing is younger than app testing but maturing fast:
| Layer | Tools |
|---|---|
| Static checks | terraform validate, tflint, terraform-docs |
| Policy / compliance | OPA / Conftest, Sentinel, Checkov, tfsec, Trivy |
| Unit | Terratest (Go), terraform test (built-in HCL test framework) |
| Integration | Spin up a scratch AWS account, apply + assert + destroy |
| Drift detection | terraform plan in CI on a schedule; alert on non-zero diff |
Minimum bar: fmt, validate, tflint, and at least one policy tool (Checkov catches “open security group to 0.0.0.0/0” style mistakes) on every PR.
Multi-environment — the repo layout question
Three common patterns:
1. Directory per env
infra/
├── modules/
│ ├── vpc/
│ └── app/
├── envs/
│ ├── dev/ # calls modules with dev inputs, dev state
│ ├── staging/
│ └── prod/
Clear separation, different backends per env. Preferred.
2. Terraform workspaces
One config, terraform workspace select prod. State-switch via CLI. Fragile — one wrong switch and you apply dev changes to prod. Not recommended for env separation.
3. Terragrunt
A wrapper that DRY’s Terraform config across envs. Popular at scale. Steeper learning curve.
IaC in a CI/CD pipeline
Typical pipeline for an infra change:
PR opened
↓
terraform fmt / validate / tflint / tfsec / checkov
↓
terraform plan ──► post plan to PR as comment
↓
human review ──► approve
↓
merge
↓
terraform apply ──► on main, gated by env (dev → staging → prod)
Critical: never apply from a laptop. Apply runs in CI with OIDC-federated creds, not long-lived keys.
Anti-patterns
- Clicking in the console, then “importing.” Do it occasionally; don’t make it a habit. Drift builds up.
- Hard-coding secrets in
.tf. Usevariable { sensitive = true }+ env vars, or fetch from KMS/Vault at runtime. - One giant state file for all infra. Blast radius = everything. Split by concern / team / lifetime (network, shared, per-app).
- Cross-state
datareads without contracts. Ifappreads output fromnetworkstate, pin the version or use explicit contracts. Otherwise changes in one break the other silently. allow_destroy = falseeverywhere → then-targethacks. Useprevent_destroyon critical resources (prod DB, prod VPC) intentionally.- Writing a giant monolithic module. Modules are for reuse; if this is used once, just put it inline.
When NOT to use IaC
- One-off experiments. Click in the console; delete when done. IaC has overhead.
- Fully managed PaaS where there’s nothing to declare (e.g. some SaaS dashboards).
- App-layer state that the application manages (DB rows, queue messages). IaC manages resources, not data.