AWS EC2 Fundamentals
EC2 — Elastic Compute Cloud — is virtual machines as a service. It was AWS’s second service (launched 2006) and is still the foundation of most workloads. Understand EC2 and you understand how AWS exposes IaaS in general.
What EC2 is
An EC2 instance is a virtual machine running on AWS’s hypervisor fleet. You pick:
- Instance type (CPU/RAM/network shape)
- AMI (the disk image — the OS and preinstalled software)
- Subnet (VPC placement → which AZ)
- Security group (firewall rules)
- Key pair (SSH access)
- User data (first-boot script)
You pay per second (with a 1-minute minimum for most types) while the instance runs.
AMI — Amazon Machine Image
An AMI is a bootable disk image. Types:
- AWS-provided — Amazon Linux, Ubuntu, RHEL, Windows Server, etc. Maintained by AWS or the OS vendor.
- Marketplace — pre-built from third parties (often with licensing bundled)
- Your own — you create from a running instance (
CreateImage) to capture a golden state
AMIs are region-scoped. Copying to another region is a deliberate action (hours for large AMIs). For cross-region HA, you pre-copy.
Two underlying storage types:
- EBS-backed — root volume is an EBS snapshot; can stop/start, survives reboot, data preserved
- Instance store-backed (rare today) — root volume is ephemeral local disk; data lost on stop
Instance types — the menu
Instance types are named <family><generation>.<size>:
m 5 . large
│ │ │ └──── size (nano, micro, small, medium, large, xlarge, 2xlarge...)
│ │ └──────── generation (5, 6i, 7g)
│ └────────── series (a=AMD, g=Graviton/ARM, i=Intel, n=network-optimized)
└──────────── family
Families you’ll see most:
| Family | Profile | Typical use |
|---|---|---|
| t | Burstable — CPU credits | Dev, small services, low-baseline workloads |
| m | General-purpose — balanced CPU/RAM | Default production choice |
| c | Compute-optimised — high CPU/RAM ratio | CPU-bound services, batch |
| r | RAM-optimised | Databases, caches, in-memory analytics |
| i / d | Storage-optimised — large local NVMe | Databases needing fast local disks |
| g / p / inf / trn | GPU / ML accelerators | Training, inference, CUDA workloads |
Sizing hint: start smaller than you think (t3.medium/m6i.large) and resize up based on observed load. Oversizing is the #1 source of wasted spend.
Instance lifecycle
pending → running → stopping → stopped → starting → running ...
↓
terminated (destroyed — root EBS gone unless configured otherwise)
- Running — billed for compute + EBS
- Stopped — not billed for compute; still billed for EBS
- Terminated — gone; root volume deleted by default (unless “Delete on Termination” was unchecked)
- Rebooting — simple restart; stays on the same host
Stop/start vs reboot: stop/start moves the instance to a potentially different physical host → instance store data is lost, public IP changes (unless EIP). Reboot stays on the same host.
Key pairs and initial access
A key pair is an SSH key (or RDP credential for Windows). You create/upload it once per region; EC2 injects the public key into the AMI at first boot.
- You don’t set a root password on Amazon Linux / Ubuntu AMIs — SSH via key pair only
- Losing the private key means no SSH in. You’d need to stop the instance, attach root volume to another instance, and inject a new key — painful. Treat private keys with care.
- Better modern option: EC2 Instance Connect (ephemeral SSH keys via IAM) or Systems Manager Session Manager (no SSH, no open port 22, full IAM control)
User data — first-boot script
A script that runs on first boot (only). Passed at launch time, retrieved by the cloud-init agent (Linux) or EC2Launch (Windows) from IMDS.
#!/bin/bash
yum update -y
yum install -y nginx
systemctl enable --now nginxUsed for:
- Installing packages
- Downloading config
- Joining clusters
- Initial bootstrapping before a config management tool (Ansible, etc.) takes over
Limits: 16 KB (base64-decoded). For anything bigger, user-data bootstraps a download.
Storage options
| Storage | Lifecycle | Performance | Use |
|---|---|---|---|
| EBS (Elastic Block Store) | Persists independent of instance | High IOPS available (io2 class) | Root volumes, databases, anything needing persistence |
| Instance store | Ephemeral — lost on stop/terminate | Local NVMe, very fast | Scratch, temp, distributed systems with their own replication |
| EFS (NFS) | Separate service | Network latency | Shared multi-instance filesystem |
| FSx | Separate service | Depends on flavour | Lustre for HPC, Windows File Server, NetApp ONTAP |
| S3 | Separate service | Object API only | Archives, artefacts, backups |
EBS volume types you’ll see:
- gp3 — general-purpose SSD; the modern default; baseline 3000 IOPS, tunable
- io2 — high-durability SSD with provisioned IOPS; for databases needing >16K IOPS
- st1 / sc1 — throughput-optimised HDD (sequential), cheap; for logs, data lakes
- gp2 — older default; gp3 is cheaper and faster — migrate if you haven’t
Networking per instance
Every EC2 instance has at least one ENI (Elastic Network Interface):
- Private IP from the subnet
- Optional public IPv4 (auto-assigned or via Elastic IP)
- Optional IPv6
- One or more Security Groups
- MAC address (rarely matters; no L2 adjacency)
You can attach additional ENIs — common for multi-homed firewalls/NVAs, or to give an instance multiple IPs.
Placement and HA concepts
- AZ placement — you pick a subnet → that dictates AZ
- Placement groups — hint to AWS for co-location or separation:
- Cluster — pack on same rack (low-latency HPC)
- Spread — force separate physical servers (HA for small groups)
- Partition — multiple logical partitions, each on separate infrastructure (HDFS, Kafka)
- Auto Scaling Groups (ASG) — launch/terminate instances dynamically based on policies; replaces unhealthy instances; spreads across AZs automatically
HA pattern: ASG + ALB/NLB + multi-AZ subnets. That’s the canonical “web tier” on AWS.
Pricing in short
| Mode | Discount | When to use |
|---|---|---|
| On-demand | 0% | Variable workloads, learning, spiky |
| Savings Plans (Compute / EC2 Instance) | 30-70% | Steady baseline for 1-3 years |
| Reserved Instances | 30-70% | Pre-SP legacy; similar effect |
| Spot | 70-90% | Interruptible workloads, batch jobs, stateless web tiers |
| Dedicated Hosts | — | License / compliance needs for physical isolation |
A mature AWS account blends: Savings Plans for baseline, On-demand for bursts, Spot for stateless bulk.
Common pitfalls
- Stopped instances still cost money via EBS. Terminate what you don’t need.
- Public IPs disappear on stop/start unless you use an Elastic IP.
t2.micro/t3.microCPU credits run out for sustained load → throttled. Not obvious from basic metrics; check “CPU credit balance” in CloudWatch.- Root volume
DeleteOnTerminationis on by default — terminate, volume is gone. For important data, snapshot or detach first. - Security group: allow 0.0.0.0/0 on port 22 — common bad habit. Use SSM Session Manager or restrict to your IP.
- Choosing the wrong instance family — compute-bound workload on
t3or memory-heavy onc6iis a classic anti-pattern. Monitor, resize.