IaC Fundamentals

Infrastructure as Code = treat infrastructure the same way you treat application code: text files in git, code review, CI/CD, tests, rollback via revert. Every cloud resource, every VM, every DNS record, every firewall rule — declared in a file, applied by a tool. The console is for reading; the code is the source of truth.

Why IaC wins

Before IaC, infra lived in people’s memory, wikis, and the console. Consequences:

  • No history. “Why is this rule open?” — nobody knows; it’s been like that for years.
  • No repro. Dev, staging, prod diverge silently. Bugs only happen in prod.
  • No rollback. A bad change = a long ticket.
  • Snowflakes. Every server is unique. Every server is fragile. Nobody wants to replace one.

IaC fixes all four:

  • Historygit log on the infra repo shows every change, author, reason.
  • Repro → the same code builds dev and prod. Differences are explicit.
  • Rollbackgit revert + terraform apply. Minutes, not days.
  • Cattle, not pets → any server is disposable; recreate it from code.

The two approaches — declarative vs imperative

The central split. See Declarative vs Imperative Automation for the deep version.

Declarative — “what”

You describe the desired state; the tool figures out how to get there.

# Terraform
resource "aws_instance" "web" {
  ami           = "ami-0abcdef"
  instance_type = "t3.small"
  tags = { Name = "web-01" }
}

Run it once → instance is created. Run it again → nothing happens (state matches). Change t3.small to t3.medium → tool computes the diff, applies only that. This is the idempotence (Idempotence) property that makes declarative IaC safe.

Examples: Terraform / OpenTofu, Pulumi, CloudFormation, Bicep, Kubernetes YAML, Ansible (with well-written modules).

Imperative — “how”

You describe the steps; the tool runs them.

aws ec2 run-instances --image-id ami-0abcdef --instance-type t3.small

Run twice → two instances. The user is responsible for the “is it already done?” check. Brittle, but fine for one-off operational tasks (restart this service, drain this node) that aren’t trying to model steady-state.

Examples: plain shell scripts, aws CLI, kubectl create, one-off Ansible command: tasks.

Rule of thumb: declarative for desired-state infra; imperative only for operational actions where “run twice” has a meaningful second effect (e.g. “restart the database”).

The IaC tool landscape

Two big layers: provisioning (create cloud resources) and config management (set up what’s inside VMs). Modern stacks often use one tool per layer.

Provisioning tools

ToolLanguageScopeNotes
TerraformHCLMulti-cloud, multi-providerThe de-facto standard. HashiCorp changed license → community fork OpenTofu (drop-in compatible)
OpenTofuHCLMulti-cloudMPL-licensed fork of Terraform; functionally equivalent
PulumiTypeScript / Python / Go / C#Multi-cloud”IaC in a real programming language”
AWS CloudFormationYAML / JSONAWS onlyFirst-party, deep AWS integration
Azure BicepBicep DSLAzure onlyNicer front-end to ARM templates
Google Deployment ManagerYAMLGCPLargely superseded by Terraform + Config Controller
CrossplaneKubernetes CRDsMulti-cloudIaC through Kubernetes; GitOps-native
CDK (AWS) / CDKTFTypeScript/Python → CFN/TerraformMulti-cloud (via TF)Generates templates from code

If you’re starting today and cloud-agnostic: Terraform or OpenTofu. If single-cloud and all-in: cloud-native (CloudFormation / Bicep) is also fine.

Config management tools

ToolModelNotes
AnsibleAgentless, SSH / WinRM; YAML playbooksEasiest to start; see Ansible Fundamentals
PuppetAgent + master; declarative DSLLong history, enterprise, declining
ChefAgent + master; Ruby DSLAcquired by Progress; declining
SaltAgent or agentless; YAML + PythonEvent-driven, fast at scale
cloud-initFirst-boot script on cloud VMsPair with Terraform for “bootstrap + hand off”
PackerBuilds golden imagesReduces config mgmt footprint at runtime

Modern preference: bake images with Packer + minimal cloud-init, then run as immutable containers / VMs. Ansible fills the gaps.

Terraform in one screen

The one tool you most need to know in 2026 IaC.

┌──────────────────────────────────────────────────────┐
│                                                      │
│    .tf files  ───► terraform plan ───► diff          │
│         │                                            │
│         │          terraform apply ───► cloud API    │
│         │                    │                       │
│         │                    ▼                       │
│         │             remote state                   │
│         │          (S3 + DynamoDB lock,              │
│         ▼           or Terraform Cloud)              │
│    git repo                                          │
│                                                      │
└──────────────────────────────────────────────────────┘

The core loop

terraform init          # download providers + configure backend
terraform fmt           # canonical formatting
terraform validate      # syntax/schema check
terraform plan          # show what will change
terraform apply         # do it (prompts for confirmation)
terraform destroy       # tear it down

Key concepts

  • Resource — a cloud object (aws_instance, azurerm_virtual_network). Declared in .tf.
  • Provider — plugin that talks to a platform (aws, azurerm, google, kubernetes, cloudflare, github, datadog, 500+ more).
  • Data source — read an existing resource without managing it (data "aws_vpc" "default" { default = true }).
  • Variable — input (variable "region" { default = "us-east-1" }).
  • Output — export a value to other modules or consumers.
  • Module — reusable bundle of resources, like a function. Can be local or pulled from the registry.
  • State — JSON file mapping declared resources to real-world IDs. Critical and sensitive.
  • Backend — where state lives (local, S3+DynamoDB, GCS, Azure Storage, Terraform Cloud). Prefer remote with locking.
  • Workspace — multiple state files from the same config (rarely the right abstraction for env separation — prefer separate configs).

A minimal module use

# main.tf
terraform {
  required_version = ">= 1.6"
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
  backend "s3" {
    bucket         = "acme-tfstate-prod"
    key            = "networking/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tfstate-locks"
    encrypt        = true
  }
}
 
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.0"
 
  name            = "prod"
  cidr            = "10.0.0.0/16"
  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
  enable_nat_gateway = true
}

State is sacred

The state file maps declared resources to the real ones. It has:

  • Secrets (DB passwords, private keys that were ever terraform apply-set).
  • The single source of truth for “who owns this resource.”
  • Metadata the cloud API doesn’t persist.

Rules:

  • Store remotely — S3 + DynamoDB lock, GCS, Terraform Cloud. Never local for shared projects.
  • Encrypt at rest.
  • Restrict access — state bucket policy limits who can read it.
  • Lock it — concurrent apply on the same state corrupts it. DynamoDB lock / TFC lock handles this.
  • Never edit by hand — use terraform state rm, terraform import, terraform state mv.
  • Back it up. Versioned S3 = free state history.

If state is lost: every resource appears “new,” and re-applying would create duplicates. Painful to recover via terraform import.

The testing story

IaC testing is younger than app testing but maturing fast:

LayerTools
Static checksterraform validate, tflint, terraform-docs
Policy / complianceOPA / Conftest, Sentinel, Checkov, tfsec, Trivy
UnitTerratest (Go), terraform test (built-in HCL test framework)
IntegrationSpin up a scratch AWS account, apply + assert + destroy
Drift detectionterraform plan in CI on a schedule; alert on non-zero diff

Minimum bar: fmt, validate, tflint, and at least one policy tool (Checkov catches “open security group to 0.0.0.0/0” style mistakes) on every PR.

Multi-environment — the repo layout question

Three common patterns:

1. Directory per env

infra/
├── modules/
│   ├── vpc/
│   └── app/
├── envs/
│   ├── dev/    # calls modules with dev inputs, dev state
│   ├── staging/
│   └── prod/

Clear separation, different backends per env. Preferred.

2. Terraform workspaces

One config, terraform workspace select prod. State-switch via CLI. Fragile — one wrong switch and you apply dev changes to prod. Not recommended for env separation.

3. Terragrunt

A wrapper that DRY’s Terraform config across envs. Popular at scale. Steeper learning curve.

IaC in a CI/CD pipeline

Typical pipeline for an infra change:

PR opened
  ↓
terraform fmt / validate / tflint / tfsec / checkov
  ↓
terraform plan  ──► post plan to PR as comment
  ↓
human review  ──► approve
  ↓
merge
  ↓
terraform apply  ──► on main, gated by env (dev → staging → prod)

Critical: never apply from a laptop. Apply runs in CI with OIDC-federated creds, not long-lived keys.

Anti-patterns

  1. Clicking in the console, then “importing.” Do it occasionally; don’t make it a habit. Drift builds up.
  2. Hard-coding secrets in .tf. Use variable { sensitive = true } + env vars, or fetch from KMS/Vault at runtime.
  3. One giant state file for all infra. Blast radius = everything. Split by concern / team / lifetime (network, shared, per-app).
  4. Cross-state data reads without contracts. If app reads output from network state, pin the version or use explicit contracts. Otherwise changes in one break the other silently.
  5. allow_destroy = false everywhere → then -target hacks. Use prevent_destroy on critical resources (prod DB, prod VPC) intentionally.
  6. Writing a giant monolithic module. Modules are for reuse; if this is used once, just put it inline.

When NOT to use IaC

  • One-off experiments. Click in the console; delete when done. IaC has overhead.
  • Fully managed PaaS where there’s nothing to declare (e.g. some SaaS dashboards).
  • App-layer state that the application manages (DB rows, queue messages). IaC manages resources, not data.

See also