AWS EBS Deep-Dive

EBS — Elastic Block Store — is network-attached block storage for EC2. It looks and behaves like a local disk from the guest OS, but it lives in a separate fleet and persists independently of the instance. Nearly every EC2 root volume and nearly every AWS database’s storage layer is EBS. Picking the right volume type is the single biggest storage-performance lever in AWS.

What EBS is

An EBS volume is a virtual disk attached to one (or in some cases many) EC2 instances. Key characteristics:

  • Persistent — data survives instance stop/terminate independently of the instance’s lifecycle
  • AZ-scoped — a volume lives in one AZ; can only attach to instances in the same AZ
  • Replicated within an AZ — EBS maintains multiple copies internally for durability (not cross-AZ)
  • Block protocol — raw blocks exposed via the hypervisor; the guest OS formats as ext4/xfs/NTFS/etc.
  • Resizable — grow online; rarely need to detach (can’t shrink an EBS volume)
  • Snapshotable — point-in-time copies stored in S3 (not directly visible as S3 objects)

Volume types — the core decision

There are two SSD families, two HDD families, and one specialty:

TypeClassIOPS (max)Throughput (max)Volume sizeTypical use
gp3General SSD16,000 (baseline 3,000)1,000 MB/s (baseline 125)1 GiB – 16 TiBDefault for most workloads
gp2General SSD (legacy)16,000 (size-linked)250 MB/s1 GiB – 16 TiBOlder default; gp3 is cheaper and strictly better
io2 Block ExpressHigh-perf SSD256,0004,000 MB/s4 GiB – 64 TiBDatabases needing > 16k IOPS, mission-critical
io2High-perf SSD64,0001,000 MB/s4 GiB – 16 TiBDatabases, 99.999% durability
st1Throughput HDD500500 MB/s125 GiB – 16 TiBBig sequential: logs, data lakes, stream processing
sc1Cold HDD250250 MB/s125 GiB – 16 TiBCheap, seldom-accessed
(magnetic)StandardLegacy; unavailable in new regions — don’t use

gp3 — the modern default

gp3 decouples size, IOPS, and throughput. You buy:

  • Baseline 3,000 IOPS + 125 MB/s free with any size
  • Extra IOPS (up to 16k) and throughput (up to 1000 MB/s) independently, at additional cost

Compared to gp2: gp3 is usually ~20% cheaper and gives provisioned IOPS without being size-linked. Migrate gp2 → gp3 — it’s a single modify-volume call with no downtime.

gp2 — why you still see it

gp2 links IOPS to volume size (3 IOPS per GiB, baseline) plus a burst-credit system for small volumes. The credit model leads to surprise throttling on small-but-busy volumes. Inferior in every way to gp3 today.

io2 / io2 Block Express

Provisioned IOPS SSDs for demanding databases. io2 Block Express is the modern tier — up to 256,000 IOPS and 4 GB/s, with sub-ms latency. 99.999% durability vs. 99.8-99.9% for gp3. Pricier per GiB + pricier per IOPS.

st1 / sc1

Spinning-disk tiers tuned for sequential workloads. Terrible for random I/O — don’t put a database here. Cheap per GB; st1 for “warm” data (logs), sc1 for “cold” (rarely touched archive, but consider S3 instead).

Performance concepts

IOPS vs throughput

  • IOPS — I/O operations per second (many small reads/writes). Relevant for DB transactions, small files.
  • Throughput (MB/s) — bytes per second (big sequential). Relevant for streaming, logs, bulk loads.
  • Often bounded by the smaller of the two for a given workload.

Baseline vs burst (gp2)

gp2 volumes under 1 TiB burst to 3,000 IOPS via a credit bucket. If credits drain, throttle to baseline (3 × GiB). Monitor BurstBalance CloudWatch metric. gp3 has no credit model — consistent performance.

Instance-level throughput limits

EBS volume performance is bounded by the instance’s EBS bandwidth. A 4 TiB io2 Block Express on a t3.small will be bottlenecked by the instance. Check “EBS-optimized” throughput per instance type. Newer instances (m6i, r6i, m7g) have much higher EBS bandwidth.

Queue depth and latency

High-performance volumes need parallel I/O to hit quoted numbers. A single thread doing one operation at a time won’t saturate 64k IOPS. Tune app-level concurrency and filesystem readahead for bulk workloads.

Snapshots — the backup primitive

A snapshot is a point-in-time, incremental, compressed copy of a volume, stored in S3 behind the scenes (not directly visible as S3 objects — opaque service storage).

Behaviour:

  • First snapshot of a volume = full copy
  • Subsequent snapshots = only changed blocks since the last
  • Delete any snapshot — AWS rebalances the chain so surviving snapshots remain usable (no “dependency chain” for the user to worry about)
  • Snapshots are region-scoped — copy to another region for DR
  • Sharable across accounts (at snapshot level) or made public
  • KMS-encrypted if the source volume was encrypted; you can change the KMS key during copy

Snapshot lifecycle (DLM)

Data Lifecycle Manager automates snapshot schedules + retention:

Policy: every 6 hours, retain 28, cross-region copy to us-west-2

Tag-based targeting, FIFO retention, optional AMI policies. The lazy path: don’t script snapshots, use DLM.

Recovery

CreateVolume --snapshot-id=snap-... → new volume in the target AZ, same size. You can resize up at creation time.

Lazy loading: a restored volume returns immediately but pulls blocks lazily from S3 on first access — initial read performance is poor. fio or dd the whole device to pre-warm; or use Fast Snapshot Restore (paid) to make it fully loaded from the start.

Encryption at rest

EBS encryption is a flip-the-switch KMS integration:

  1. Create volume → “encrypt with key <alias/aws/ebs> or customer-managed CMK”
  2. AWS generates a data key, wraps it with your CMK, stores wrapped key with volume
  3. At attach time, hypervisor calls KMS:Decrypt once to unwrap
  4. Plaintext data key lives in the hypervisor; every write is encrypted transparently

Guest OS sees a normal block device; no performance penalty.

Snapshots of encrypted volumes are encrypted (same key). New volumes from encrypted snapshots are encrypted. You cannot unencrypt a volume — only create an unencrypted copy by mounting and copying files out.

Enable “EBS Encryption by Default” at the regional account level — every new volume gets encrypted, period.

Attachment & multi-attach

  • A standard EBS volume attaches to one instance at a time
  • io1 and io2 (in certain configurations, Nitro instances only) support Multi-Attach — up to 16 Nitro instances in the same AZ attach one volume concurrently
  • The guest apps must coordinate (cluster filesystem or app-level locking) — EBS provides no concurrency guarantees itself
  • Typical use: Oracle RAC, SAP, shared-storage database clusters

For most “shared filesystem” needs, EFS or FSx is the correct choice, not Multi-Attach EBS.

Resizing

You can:

  • Grow size (any time, online)
  • Change type (gp2 → gp3, gp3 → io2, etc.)
  • Change IOPS/throughput (gp3, io1, io2)

After a resize, the guest OS needs to be told (growpart + resize2fs / xfs_growfs).

You cannot shrink — to shrink you create a smaller volume and dd / rsync data over.

Modification cooldown: 6 hours between modifications on the same volume.

Durability and availability

  • Annual failure rate ~0.1-0.2% for gp3, ~0.001% for io2 (five 9s)
  • Replicated within one AZ — an AZ outage can make a volume unavailable; data isn’t lost but you can’t read it until the AZ recovers
  • For cross-AZ resilience: snapshots + multi-AZ replicas at the app layer (RDS Multi-AZ does this automatically)
  • RAID on EBS? Rarely worth it — EBS already replicates internally. Some workloads do RAID-0 across EBS volumes for aggregated throughput on a single instance. RAID Levels has the general trade-offs.

Cost levers

  • gp2 → gp3 migration almost always saves money
  • Right-size: overprovisioned IOPS on gp2 (large-volume-for-IOPS trick) wastes GB capacity; gp3 lets you tune independently
  • Delete orphaned volumes after terminating instances — root volumes DeleteOnTermination=true by default, but detached extra volumes linger forever
  • Delete snapshots you don’t need — incremental, but still billed per byte stored
  • Move cold snapshots to Archive tier — 75% cheaper, but 24-72h restore time

Instance store vs EBS

Not everything at EC2 is EBS. Instance store (aka ephemeral NVMe) is local disk on the hypervisor host. Characteristics:

  • Ephemeral — lost on stop/terminate
  • Very fast — local NVMe, no network round-trip
  • Free — included in instance pricing
  • Not snapshottable
  • Sizes fixed per instance type (some instance types have none)

Use for: scratch, temp, distributed DBs with their own replication (Cassandra, Kafka), high-bandwidth caches. Never use for: anything you can’t afford to lose.

Common pitfalls

  1. DeleteOnTermination for root volumes is ON by default. Terminate, data gone. For critical data, either detach before terminate or toggle the flag.
  2. Assuming gp2 burst lasts forever. Under sustained load, small gp2 volumes drain burst and throttle. Move to gp3.
  3. Attaching to the wrong AZ. Volume and instance must match AZ.
  4. Unencrypted volumes sneaking in. Enable “Encrypt by Default” at the regional level.
  5. Lazy-loaded restored volumes performing poorly on first access. Pre-warm or pay for Fast Snapshot Restore.
  6. Multi-Attach without a cluster FS. Two nodes writing to the same blocks = corruption. EBS doesn’t prevent this.
  7. Snapshot pricing surprise. Frequent snapshots of a churny volume = lots of incremental data + storage cost. Tune DLM retention and consider Archive tier.
  8. Resizing EBS without growing the filesystem. Volume is 500 GiB, df still shows 100 GiB. Run growpart + resize2fs/xfs_growfs.

Mental model

  • EBS = a SAN that AWS runs for you — iSCSI-adjacent, but via the Nitro hypervisor. SAN vs NAS concepts apply.
  • Volume type = your seat on the performance curve. gp3 for most, io2 BX for extreme, HDD for cheap sequential.
  • Snapshots = incremental, S3-backed backups with near-infinite durability.
  • Encryption is basically free — always on.
  • The instance’s EBS bandwidth is the real ceiling at high IOPS.

See also