AWS EC2 Fundamentals

EC2 — Elastic Compute Cloud — is virtual machines as a service. It was AWS’s second service (launched 2006) and is still the foundation of most workloads. Understand EC2 and you understand how AWS exposes IaaS in general.

What EC2 is

An EC2 instance is a virtual machine running on AWS’s hypervisor fleet. You pick:

  • Instance type (CPU/RAM/network shape)
  • AMI (the disk image — the OS and preinstalled software)
  • Subnet (VPC placement → which AZ)
  • Security group (firewall rules)
  • Key pair (SSH access)
  • User data (first-boot script)

You pay per second (with a 1-minute minimum for most types) while the instance runs.

AMI — Amazon Machine Image

An AMI is a bootable disk image. Types:

  • AWS-provided — Amazon Linux, Ubuntu, RHEL, Windows Server, etc. Maintained by AWS or the OS vendor.
  • Marketplace — pre-built from third parties (often with licensing bundled)
  • Your own — you create from a running instance (CreateImage) to capture a golden state

AMIs are region-scoped. Copying to another region is a deliberate action (hours for large AMIs). For cross-region HA, you pre-copy.

Two underlying storage types:

  • EBS-backed — root volume is an EBS snapshot; can stop/start, survives reboot, data preserved
  • Instance store-backed (rare today) — root volume is ephemeral local disk; data lost on stop

Instance types — the menu

Instance types are named <family><generation>.<size>:

    m 5 . large
    │ │ │  └──── size (nano, micro, small, medium, large, xlarge, 2xlarge...)
    │ │ └──────── generation (5, 6i, 7g)
    │ └────────── series (a=AMD, g=Graviton/ARM, i=Intel, n=network-optimized)
    └──────────── family

Families you’ll see most:

FamilyProfileTypical use
tBurstable — CPU creditsDev, small services, low-baseline workloads
mGeneral-purpose — balanced CPU/RAMDefault production choice
cCompute-optimised — high CPU/RAM ratioCPU-bound services, batch
rRAM-optimisedDatabases, caches, in-memory analytics
i / dStorage-optimised — large local NVMeDatabases needing fast local disks
g / p / inf / trnGPU / ML acceleratorsTraining, inference, CUDA workloads

Sizing hint: start smaller than you think (t3.medium/m6i.large) and resize up based on observed load. Oversizing is the #1 source of wasted spend.

Instance lifecycle

  pending → running → stopping → stopped → starting → running ...
                   ↓
                 terminated (destroyed — root EBS gone unless configured otherwise)
  • Running — billed for compute + EBS
  • Stopped — not billed for compute; still billed for EBS
  • Terminated — gone; root volume deleted by default (unless “Delete on Termination” was unchecked)
  • Rebooting — simple restart; stays on the same host

Stop/start vs reboot: stop/start moves the instance to a potentially different physical host → instance store data is lost, public IP changes (unless EIP). Reboot stays on the same host.

Key pairs and initial access

A key pair is an SSH key (or RDP credential for Windows). You create/upload it once per region; EC2 injects the public key into the AMI at first boot.

  • You don’t set a root password on Amazon Linux / Ubuntu AMIs — SSH via key pair only
  • Losing the private key means no SSH in. You’d need to stop the instance, attach root volume to another instance, and inject a new key — painful. Treat private keys with care.
  • Better modern option: EC2 Instance Connect (ephemeral SSH keys via IAM) or Systems Manager Session Manager (no SSH, no open port 22, full IAM control)

User data — first-boot script

A script that runs on first boot (only). Passed at launch time, retrieved by the cloud-init agent (Linux) or EC2Launch (Windows) from IMDS.

#!/bin/bash
yum update -y
yum install -y nginx
systemctl enable --now nginx

Used for:

  • Installing packages
  • Downloading config
  • Joining clusters
  • Initial bootstrapping before a config management tool (Ansible, etc.) takes over

Limits: 16 KB (base64-decoded). For anything bigger, user-data bootstraps a download.

Storage options

StorageLifecyclePerformanceUse
EBS (Elastic Block Store)Persists independent of instanceHigh IOPS available (io2 class)Root volumes, databases, anything needing persistence
Instance storeEphemeral — lost on stop/terminateLocal NVMe, very fastScratch, temp, distributed systems with their own replication
EFS (NFS)Separate serviceNetwork latencyShared multi-instance filesystem
FSxSeparate serviceDepends on flavourLustre for HPC, Windows File Server, NetApp ONTAP
S3Separate serviceObject API onlyArchives, artefacts, backups

EBS volume types you’ll see:

  • gp3 — general-purpose SSD; the modern default; baseline 3000 IOPS, tunable
  • io2 — high-durability SSD with provisioned IOPS; for databases needing >16K IOPS
  • st1 / sc1 — throughput-optimised HDD (sequential), cheap; for logs, data lakes
  • gp2 — older default; gp3 is cheaper and faster — migrate if you haven’t

Networking per instance

Every EC2 instance has at least one ENI (Elastic Network Interface):

  • Private IP from the subnet
  • Optional public IPv4 (auto-assigned or via Elastic IP)
  • Optional IPv6
  • One or more Security Groups
  • MAC address (rarely matters; no L2 adjacency)

You can attach additional ENIs — common for multi-homed firewalls/NVAs, or to give an instance multiple IPs.

Placement and HA concepts

  • AZ placement — you pick a subnet → that dictates AZ
  • Placement groups — hint to AWS for co-location or separation:
    • Cluster — pack on same rack (low-latency HPC)
    • Spread — force separate physical servers (HA for small groups)
    • Partition — multiple logical partitions, each on separate infrastructure (HDFS, Kafka)
  • Auto Scaling Groups (ASG) — launch/terminate instances dynamically based on policies; replaces unhealthy instances; spreads across AZs automatically

HA pattern: ASG + ALB/NLB + multi-AZ subnets. That’s the canonical “web tier” on AWS.

Pricing in short

ModeDiscountWhen to use
On-demand0%Variable workloads, learning, spiky
Savings Plans (Compute / EC2 Instance)30-70%Steady baseline for 1-3 years
Reserved Instances30-70%Pre-SP legacy; similar effect
Spot70-90%Interruptible workloads, batch jobs, stateless web tiers
Dedicated HostsLicense / compliance needs for physical isolation

A mature AWS account blends: Savings Plans for baseline, On-demand for bursts, Spot for stateless bulk.

Common pitfalls

  1. Stopped instances still cost money via EBS. Terminate what you don’t need.
  2. Public IPs disappear on stop/start unless you use an Elastic IP.
  3. t2.micro/t3.micro CPU credits run out for sustained load → throttled. Not obvious from basic metrics; check “CPU credit balance” in CloudWatch.
  4. Root volume DeleteOnTermination is on by default — terminate, volume is gone. For important data, snapshot or detach first.
  5. Security group: allow 0.0.0.0/0 on port 22 — common bad habit. Use SSM Session Manager or restrict to your IP.
  6. Choosing the wrong instance family — compute-bound workload on t3 or memory-heavy on c6i is a classic anti-pattern. Monitor, resize.

See also