AWS S3 Fundamentals

S3 — Simple Storage Service — is AWS’s object store. It was the first AWS service (2006) and has become one of the foundational building blocks of the cloud. Almost every other AWS service interacts with S3 in some form.

What S3 is

S3 is object storage — not file, not block. The distinction matters:

	Block	File	Object
Accessed by	Block offset	Path (`/home/user/file`)	Flat key over HTTP
Protocol	iSCSI, NVMe-oF	NFS, SMB	HTTPS (S3 API)
Mutability	Read/write anywhere	Read/write anywhere	Replace whole objects
Max size	Depends on filesystem	Depends on filesystem	5 TB per object
Scale	TBs	TBs	Effectively unlimited
AWS example	EBS	EFS	S3

See SAN vs NAS for the block/file/object storage deep-dive.

Buckets, objects, keys

Bucket — the container. Named globally unique (my-company-logs), bound to one region.
Object — any file-like blob. Has a key (the “filename”), metadata, and data.
Key — a string up to 1024 bytes. Forward slashes are allowed in keys and displayed as folders in the console, but there are no real folders — just keys with / in them.

Bucket:  my-company-logs
Objects:
    logs/2026/04/23/app.log
    logs/2026/04/24/app.log
    backups/db-2026-04-23.sql.gz
    index.html

Treating those slashes as folders is a useful convention; the underlying model is flat.

Regions, endpoints, global names

Bucket names are globally unique across all AWS accounts (a historical API design choice)
Data lives in the region where the bucket was created
Accessed via regional endpoints (https://my-bucket.s3.eu-central-1.amazonaws.com/key) or the older virtual-hosted style
Cross-region is not automatic — S3 Replication (CRR) is a feature you explicitly configure for backups/DR

Storage classes

S3 is not one storage tier — it’s several, each with different cost/availability/retrieval profiles.

Class	Purpose	Retrieval	Cost $
Standard	Default; hot data	Instant	$$$
Intelligent-Tiering	Auto-tier based on access	Instant	$$
Standard-IA	Infrequent access, needs instant retrieval	Instant (but retrieval fee)	$$
One Zone-IA	IA, single AZ (lower durability)	Instant	$
Glacier Instant Retrieval	Archive, rare access	Instant	$
Glacier Flexible Retrieval	Archive	Minutes to hours	$
Glacier Deep Archive	Long-term archive (7+ years)	Hours	¢

Rules of thumb:

Don’t move small objects to Glacier (minimum billable size can eat the savings)
Use Intelligent-Tiering when access patterns are unknown; it auto-optimises
Lifecycle rules automate transitions: “after 90 days, move to IA; after 1 year, to Glacier; after 7 years, delete”

Durability and availability

Durability — probability of not losing an object. S3 Standard claims eleven nines (99.999999999%) durability — meaning, statistically, you’d lose one object per 10,000 buckets per 10 million years. This is achieved via internal replication across multiple AZs.
Availability — probability of being able to access an object right now. S3 Standard targets 99.99%; One Zone-IA drops to 99.5%.

These are not the same. Data is almost never lost. Being able to read it right now is a separate guarantee.

Access control — the layers

There are four mechanisms that can grant access. They are evaluated together.

IAM policies (identity-based) — what your principals are allowed to do across S3
Bucket policies (resource-based JSON on the bucket) — who can touch this specific bucket
ACLs (legacy per-object and per-bucket) — mostly avoid; modern default has ACLs disabled
Block Public Access — a global override; when enabled (as it is by default on new buckets), it blocks anything public regardless of the other layers

The practical modern approach:

Block Public Access: ON on every bucket (default since 2023)
Grant access via bucket policy for cross-account and via IAM for same-account
Leave ACLs disabled (the “Object Ownership: Bucket owner enforced” setting — now default on new buckets)

Making something public intentionally

For a static website or public downloads:

Disable the relevant Block Public Access settings for that bucket
Attach a bucket policy granting s3:GetObject to *
Optionally front with CloudFront to avoid bucket-level public exposure — users fetch via CloudFront, bucket stays locked down via origin access control (OAC)

Encryption

At rest: enabled by default on all new objects (SSE-S3, AES-256, AWS-managed keys). You can upgrade to SSE-KMS for customer-managed keys or SSE-C to provide your own keys.
In transit: HTTPS by default; enforce via bucket policy "aws:SecureTransport": "true" to block any HTTP request.

Versioning

When enabled, every PUT creates a new version instead of overwriting. Deletes create a “delete marker” but don’t remove old versions.

Recovers from accidental overwrites and deletes
Works with MFA Delete for extra protection on production buckets
Costs accumulate — all versions are stored; pair with lifecycle rules to age out old versions

For any bucket holding important data (IaC state files, backups, compliance data): enable versioning. Non-negotiable.

Common features you’ll touch

Presigned URLs — generate a time-limited URL that lets someone download or upload without needing AWS credentials. The standard way to share a single object securely.
Multipart upload — required for objects >5 GB, recommended for anything >100 MB. Uploads in parallel chunks; SDKs handle it automatically.
S3 Select — run SQL queries against objects (CSV, JSON, Parquet) without downloading
Event notifications — trigger Lambda, SQS, SNS when objects are created/deleted
Replication (CRR/SRR) — async copy to another bucket, cross-region (CRR) or same-region (SRR)
Object Lock — WORM (write-once, read-many) for compliance; combined with versioning for immutable storage
Transfer Acceleration — upload via the nearest CloudFront edge to reach S3 faster

Common pitfalls

Public buckets. Historically the cause of many data leaks. Modern defaults protect you; don’t disable them without deliberation.
Using S3 like a filesystem. S3 has no locking, no append, no rename (rename = PUT + DELETE). Many concurrent writers to the same key will have last-write-wins. For mutable shared state, use DynamoDB or a proper database.
Cost surprises:
- Egress to internet (especially from CloudFront misses) is billed per GB
- Lots of small files → per-request costs dominate
- Versioning without lifecycle → storage bloats forever
Forgetting bucket naming constraints — lowercase, 3-63 chars, no underscores, globally unique. Plan.
IAM + bucket policy conflicts. Both must allow. A bucket policy denying your IAM-blessed access wins (explicit deny).
Glacier retrieval latency. If your DR plan says “restore from Glacier in minutes”, confirm the retrieval tier matches.

The role of S3 in the broader AWS ecosystem

You’ll touch S3 indirectly through almost everything:

CloudTrail logs are delivered to S3
CloudFront serves content from S3 origins
Athena / Redshift Spectrum query data in S3 (data lake pattern)
Lambda reads/writes S3, triggered by S3 events
Terraform state is commonly stored in S3 (+ DynamoDB for locking)
EC2 AMIs are internally backed by S3
EBS snapshots are stored in S3 (not visible to you, but that’s where they live)

S3 is the most-touched service in AWS. Knowing its quirks compounds.

IT Knowledge DB

Explorer

AWS S3 Fundamentals

AWS S3 Fundamentals

What S3 is

Buckets, objects, keys

Regions, endpoints, global names

Storage classes

Durability and availability

Access control — the layers

Making something public intentionally

Encryption

Versioning

Common features you’ll touch

Common pitfalls

The role of S3 in the broader AWS ecosystem

See also

Graph View

Table of Contents

Backlinks