AWS S3 Fundamentals
S3 — Simple Storage Service — is AWS’s object store. It was the first AWS service (2006) and has become one of the foundational building blocks of the cloud. Almost every other AWS service interacts with S3 in some form.
What S3 is
S3 is object storage — not file, not block. The distinction matters:
| Block | File | Object | |
|---|---|---|---|
| Accessed by | Block offset | Path (/home/user/file) | Flat key over HTTP |
| Protocol | iSCSI, NVMe-oF | NFS, SMB | HTTPS (S3 API) |
| Mutability | Read/write anywhere | Read/write anywhere | Replace whole objects |
| Max size | Depends on filesystem | Depends on filesystem | 5 TB per object |
| Scale | TBs | TBs | Effectively unlimited |
| AWS example | EBS | EFS | S3 |
See SAN vs NAS for the block/file/object storage deep-dive.
Buckets, objects, keys
- Bucket — the container. Named globally unique (
my-company-logs), bound to one region. - Object — any file-like blob. Has a key (the “filename”), metadata, and data.
- Key — a string up to 1024 bytes. Forward slashes are allowed in keys and displayed as folders in the console, but there are no real folders — just keys with
/in them.
Bucket: my-company-logs
Objects:
logs/2026/04/23/app.log
logs/2026/04/24/app.log
backups/db-2026-04-23.sql.gz
index.html
Treating those slashes as folders is a useful convention; the underlying model is flat.
Regions, endpoints, global names
- Bucket names are globally unique across all AWS accounts (a historical API design choice)
- Data lives in the region where the bucket was created
- Accessed via regional endpoints (
https://my-bucket.s3.eu-central-1.amazonaws.com/key) or the older virtual-hosted style - Cross-region is not automatic — S3 Replication (CRR) is a feature you explicitly configure for backups/DR
Storage classes
S3 is not one storage tier — it’s several, each with different cost/availability/retrieval profiles.
| Class | Purpose | Retrieval | Cost $ |
|---|---|---|---|
| Standard | Default; hot data | Instant | $$$ |
| Intelligent-Tiering | Auto-tier based on access | Instant | $$ |
| Standard-IA | Infrequent access, needs instant retrieval | Instant (but retrieval fee) | $$ |
| One Zone-IA | IA, single AZ (lower durability) | Instant | $ |
| Glacier Instant Retrieval | Archive, rare access | Instant | $ |
| Glacier Flexible Retrieval | Archive | Minutes to hours | $ |
| Glacier Deep Archive | Long-term archive (7+ years) | Hours | ¢ |
Rules of thumb:
- Don’t move small objects to Glacier (minimum billable size can eat the savings)
- Use Intelligent-Tiering when access patterns are unknown; it auto-optimises
- Lifecycle rules automate transitions: “after 90 days, move to IA; after 1 year, to Glacier; after 7 years, delete”
Durability and availability
- Durability — probability of not losing an object. S3 Standard claims eleven nines (99.999999999%) durability — meaning, statistically, you’d lose one object per 10,000 buckets per 10 million years. This is achieved via internal replication across multiple AZs.
- Availability — probability of being able to access an object right now. S3 Standard targets 99.99%; One Zone-IA drops to 99.5%.
These are not the same. Data is almost never lost. Being able to read it right now is a separate guarantee.
Access control — the layers
There are four mechanisms that can grant access. They are evaluated together.
- IAM policies (identity-based) — what your principals are allowed to do across S3
- Bucket policies (resource-based JSON on the bucket) — who can touch this specific bucket
- ACLs (legacy per-object and per-bucket) — mostly avoid; modern default has ACLs disabled
- Block Public Access — a global override; when enabled (as it is by default on new buckets), it blocks anything public regardless of the other layers
The practical modern approach:
- Block Public Access: ON on every bucket (default since 2023)
- Grant access via bucket policy for cross-account and via IAM for same-account
- Leave ACLs disabled (the “Object Ownership: Bucket owner enforced” setting — now default on new buckets)
Making something public intentionally
For a static website or public downloads:
- Disable the relevant Block Public Access settings for that bucket
- Attach a bucket policy granting
s3:GetObjectto* - Optionally front with CloudFront to avoid bucket-level public exposure — users fetch via CloudFront, bucket stays locked down via origin access control (OAC)
Encryption
- At rest: enabled by default on all new objects (SSE-S3, AES-256, AWS-managed keys). You can upgrade to SSE-KMS for customer-managed keys or SSE-C to provide your own keys.
- In transit: HTTPS by default; enforce via bucket policy
"aws:SecureTransport": "true"to block any HTTP request.
Versioning
When enabled, every PUT creates a new version instead of overwriting. Deletes create a “delete marker” but don’t remove old versions.
- Recovers from accidental overwrites and deletes
- Works with MFA Delete for extra protection on production buckets
- Costs accumulate — all versions are stored; pair with lifecycle rules to age out old versions
For any bucket holding important data (IaC state files, backups, compliance data): enable versioning. Non-negotiable.
Common features you’ll touch
- Presigned URLs — generate a time-limited URL that lets someone download or upload without needing AWS credentials. The standard way to share a single object securely.
- Multipart upload — required for objects >5 GB, recommended for anything >100 MB. Uploads in parallel chunks; SDKs handle it automatically.
- S3 Select — run SQL queries against objects (CSV, JSON, Parquet) without downloading
- Event notifications — trigger Lambda, SQS, SNS when objects are created/deleted
- Replication (CRR/SRR) — async copy to another bucket, cross-region (CRR) or same-region (SRR)
- Object Lock — WORM (write-once, read-many) for compliance; combined with versioning for immutable storage
- Transfer Acceleration — upload via the nearest CloudFront edge to reach S3 faster
Common pitfalls
- Public buckets. Historically the cause of many data leaks. Modern defaults protect you; don’t disable them without deliberation.
- Using S3 like a filesystem. S3 has no locking, no append, no rename (rename = PUT + DELETE). Many concurrent writers to the same key will have last-write-wins. For mutable shared state, use DynamoDB or a proper database.
- Cost surprises:
- Egress to internet (especially from CloudFront misses) is billed per GB
- Lots of small files → per-request costs dominate
- Versioning without lifecycle → storage bloats forever
- Forgetting bucket naming constraints — lowercase, 3-63 chars, no underscores, globally unique. Plan.
- IAM + bucket policy conflicts. Both must allow. A bucket policy denying your IAM-blessed access wins (explicit deny).
- Glacier retrieval latency. If your DR plan says “restore from Glacier in minutes”, confirm the retrieval tier matches.
The role of S3 in the broader AWS ecosystem
You’ll touch S3 indirectly through almost everything:
- CloudTrail logs are delivered to S3
- CloudFront serves content from S3 origins
- Athena / Redshift Spectrum query data in S3 (data lake pattern)
- Lambda reads/writes S3, triggered by S3 events
- Terraform state is commonly stored in S3 (+ DynamoDB for locking)
- EC2 AMIs are internally backed by S3
- EBS snapshots are stored in S3 (not visible to you, but that’s where they live)
S3 is the most-touched service in AWS. Knowing its quirks compounds.
See also
- SAN vs NAS — where S3 fits among storage paradigms
- AWS IAM Fundamentals — how access control layers interact
- Encryption, Authentication
- Backup Fundamentals — RPO and RTO