AWS Route 53
Route 53 is AWS’s DNS. It’s a global service (not region-bound), and it’s more than a name resolver — the routing policies turn it into a global traffic manager and a primary HA mechanism. For a network engineer, think of it as managed authoritative DNS with health-check-driven policy built in.
What Route 53 is
Three distinct products under one name:
- Authoritative DNS — public hosted zones and private hosted zones
- Domain registrar — buy/transfer domains directly in AWS
- Health checks + routing policies — the “traffic manager” layer
The authoritative piece is what matters most. The registrar is optional — you can host DNS in Route 53 with a domain registered anywhere.
Hosted zones — public vs private
A hosted zone is a container for records for one domain (e.g. example.com).
| Public hosted zone | Private hosted zone | |
|---|---|---|
| Resolved from | Anywhere on the internet | Only from specified VPCs |
| Use | External-facing records | Internal service discovery |
| Authoritative NS | Route 53’s 4 global NS shards | AWS internal resolver only |
| Records can resolve to | Anything | Anything (often private IPs) |
Split-horizon DNS is straightforward: create both a public and a private hosted zone for the same name. Public queries get public IPs; VPC queries get private IPs. Route 53 doesn’t merge them — they’re independent zones that happen to share a name.
Record types you’ll actually use
- A / AAAA — name → IPv4 / IPv6
- CNAME — name → another name (can’t be at zone apex!)
- Alias — Route 53-specific. Like a CNAME, but can point at AWS resources (ALB, CloudFront, S3 website, API Gateway) and works at the zone apex. Free queries. Use aliases for AWS targets wherever possible.
- MX / TXT / NS / PTR / SRV / CAA — standard
- NS at apex = Route 53’s nameservers for this zone
The apex-CNAME trick. DNS forbids CNAME at the zone apex (example.com). Classic workaround: run the apex on A records pointing to a fixed IP. Route 53’s Alias record solves this — it behaves like a CNAME to an AWS resource, stored as A/AAAA at the DNS level. Use alias records anytime you’d want example.com → my-alb-....
Routing policies — the differentiator
A record can have a routing policy that governs which answer gets returned when multiple records share the same name.
| Policy | Behaviour |
|---|---|
| Simple | One record, one answer. Default. |
| Weighted | Split traffic by percentage across records. Canary deploys, A/B. |
| Latency-based | Return the record whose region is closest (lowest latency) to the resolver. Global active/active apps. |
| Geolocation | Route by the resolver’s country/continent. Compliance-driven routing, localised content. |
| Geoproximity | Route by geographic distance with optional bias. Requires Traffic Flow. |
| Failover | Primary / secondary. Health-check-driven. Active/passive HA. |
| Multivalue answer | Return multiple healthy records (up to 8), randomised. Poor-man’s load balancer with health checks. |
| IP-based | Route by resolver IP / CIDR. Sticky routing by network. |
Multiple record sets sharing a name must all use the same policy type (can’t mix Weighted with Latency for the same name).
Health checks — the HA engine
A health check probes an endpoint (HTTP/HTTPS/TCP) from multiple AWS vantage points and aggregates a healthy/unhealthy verdict. Attach a health check to a record; unhealthy records are withheld from responses.
Flavours:
- Endpoint health check — HTTP/HTTPS/TCP to a specific IP or hostname
- Calculated health check — boolean of multiple child health checks (AND/OR semantics)
- CloudWatch-alarm health check — treat a CloudWatch alarm state as health
Key knobs:
- Interval: 10s (fast, costs more) or 30s (standard)
- Failure threshold: how many consecutive failures before “unhealthy”
- String matching: require the response body to contain a literal string
- Latency graphs: charted in the console
- Inverted: treat “failing” as healthy (useful for maintenance-page logic)
The classic failover pattern
example.com (FAILOVER policy)
├── Primary → ALB in us-east-1 [health check: /health]
└── Secondary → static S3 page [no health check needed]
If the primary health check fails, Route 53 returns the secondary. DNS TTL controls how fast clients reconverge (typically 60s).
Resolvers — the VPC side
Inside a VPC, the Amazon-provided DNS resolver runs at VPC_CIDR + 2 (so 10.0.0.0/16 → 10.0.0.2). Behaviour:
- Resolves public names via Route 53 public DNS
- Resolves private hosted zone names for zones associated with this VPC
- Resolves AWS service endpoints to regional addresses (for interface endpoints: to private IPs)
Route 53 Resolver is the productised version, offering:
- Inbound endpoints — on-prem DNS servers can forward AWS names here (resolves private hosted zones for on-prem)
- Outbound endpoints — VPC resolver forwards certain zones (e.g.
corp.internal) to on-prem DNS - Resolver rules — conditional forwarding per zone
- Resolver Query Logging — every DNS query from the VPC, written to CloudWatch Logs / S3 / Firehose
This is how you build hybrid DNS — bidirectional resolution between AWS and on-prem.
Private DNS for service endpoints
Many AWS services have a public DNS name. When you use VPC Interface Endpoints, enabling Private DNS for the endpoint rewrites the service’s public name to resolve to the endpoint’s private IP inside the VPC — no code change needed.
This interaction lives entirely inside the VPC resolver and doesn’t show up in any hosted zone.
DNSSEC
Route 53 supports DNSSEC signing for public hosted zones:
- KSK stored in KMS (customer-managed CMK in
us-east-1) - ZSK managed by Route 53
- Parent zone must have matching DS record (for
.com, registrar uploads it)
DNSSEC validation (resolver-side) is not yet on by default in Route 53 Resolver — configurable per VPC.
Common pitfalls
- CNAME at apex — use an Alias record to the AWS target, not a CNAME.
- TTL too high during cutovers. Lower TTLs hours before a change so clients pick up the new answer quickly.
- Health-check target unreachable from AWS probers. Ensure SGs/NACLs on the target allow the Route 53 health-checker IPs (service:
ROUTE53_HEALTHCHECKS). - Mixing policy types under the same record name — not allowed; all siblings must share a policy.
- Private hosted zone association — a PHZ is useless unless associated with the right VPCs. Cross-account association is possible but needs explicit API calls.
dig @VPC_DNSdoesn’t work from outside a VPC. The resolver is only reachable from inside its VPC. Inbound Resolver endpoints expose it deliberately.- Latency-based records are geolocated by resolver, not end-user. Users behind a remote recursive resolver (Google 8.8.8.8, Cloudflare 1.1.1.1) may route suboptimally. EDNS Client Subnet helps; not always honoured.
Mental model for a network engineer
- Authoritative DNS with AWS-specific ergonomics (Alias records, policies, health checks)
- VPC resolver = the recursive resolver your instances use; it’s the seam where Route 53 meets the rest of the internet
- Route 53 Resolver endpoints = the “forwarding” you’d otherwise build with BIND/Unbound in hybrid setups
- Health check + Failover = managed VRRP-ish at the DNS layer, but works globally and at L7