AWS Route 53

Route 53 is AWS’s DNS. It’s a global service (not region-bound), and it’s more than a name resolver — the routing policies turn it into a global traffic manager and a primary HA mechanism. For a network engineer, think of it as managed authoritative DNS with health-check-driven policy built in.

What Route 53 is

Three distinct products under one name:

  1. Authoritative DNS — public hosted zones and private hosted zones
  2. Domain registrar — buy/transfer domains directly in AWS
  3. Health checks + routing policies — the “traffic manager” layer

The authoritative piece is what matters most. The registrar is optional — you can host DNS in Route 53 with a domain registered anywhere.

Hosted zones — public vs private

A hosted zone is a container for records for one domain (e.g. example.com).

Public hosted zonePrivate hosted zone
Resolved fromAnywhere on the internetOnly from specified VPCs
UseExternal-facing recordsInternal service discovery
Authoritative NSRoute 53’s 4 global NS shardsAWS internal resolver only
Records can resolve toAnythingAnything (often private IPs)

Split-horizon DNS is straightforward: create both a public and a private hosted zone for the same name. Public queries get public IPs; VPC queries get private IPs. Route 53 doesn’t merge them — they’re independent zones that happen to share a name.

Record types you’ll actually use

  • A / AAAA — name → IPv4 / IPv6
  • CNAME — name → another name (can’t be at zone apex!)
  • Alias — Route 53-specific. Like a CNAME, but can point at AWS resources (ALB, CloudFront, S3 website, API Gateway) and works at the zone apex. Free queries. Use aliases for AWS targets wherever possible.
  • MX / TXT / NS / PTR / SRV / CAA — standard
  • NS at apex = Route 53’s nameservers for this zone

The apex-CNAME trick. DNS forbids CNAME at the zone apex (example.com). Classic workaround: run the apex on A records pointing to a fixed IP. Route 53’s Alias record solves this — it behaves like a CNAME to an AWS resource, stored as A/AAAA at the DNS level. Use alias records anytime you’d want example.com → my-alb-....

Routing policies — the differentiator

A record can have a routing policy that governs which answer gets returned when multiple records share the same name.

PolicyBehaviour
SimpleOne record, one answer. Default.
WeightedSplit traffic by percentage across records. Canary deploys, A/B.
Latency-basedReturn the record whose region is closest (lowest latency) to the resolver. Global active/active apps.
GeolocationRoute by the resolver’s country/continent. Compliance-driven routing, localised content.
GeoproximityRoute by geographic distance with optional bias. Requires Traffic Flow.
FailoverPrimary / secondary. Health-check-driven. Active/passive HA.
Multivalue answerReturn multiple healthy records (up to 8), randomised. Poor-man’s load balancer with health checks.
IP-basedRoute by resolver IP / CIDR. Sticky routing by network.

Multiple record sets sharing a name must all use the same policy type (can’t mix Weighted with Latency for the same name).

Health checks — the HA engine

A health check probes an endpoint (HTTP/HTTPS/TCP) from multiple AWS vantage points and aggregates a healthy/unhealthy verdict. Attach a health check to a record; unhealthy records are withheld from responses.

Flavours:

  • Endpoint health check — HTTP/HTTPS/TCP to a specific IP or hostname
  • Calculated health check — boolean of multiple child health checks (AND/OR semantics)
  • CloudWatch-alarm health check — treat a CloudWatch alarm state as health

Key knobs:

  • Interval: 10s (fast, costs more) or 30s (standard)
  • Failure threshold: how many consecutive failures before “unhealthy”
  • String matching: require the response body to contain a literal string
  • Latency graphs: charted in the console
  • Inverted: treat “failing” as healthy (useful for maintenance-page logic)

The classic failover pattern

example.com (FAILOVER policy)
   ├── Primary   → ALB in us-east-1  [health check: /health]
   └── Secondary → static S3 page    [no health check needed]

If the primary health check fails, Route 53 returns the secondary. DNS TTL controls how fast clients reconverge (typically 60s).

Resolvers — the VPC side

Inside a VPC, the Amazon-provided DNS resolver runs at VPC_CIDR + 2 (so 10.0.0.0/1610.0.0.2). Behaviour:

  • Resolves public names via Route 53 public DNS
  • Resolves private hosted zone names for zones associated with this VPC
  • Resolves AWS service endpoints to regional addresses (for interface endpoints: to private IPs)

Route 53 Resolver is the productised version, offering:

  • Inbound endpoints — on-prem DNS servers can forward AWS names here (resolves private hosted zones for on-prem)
  • Outbound endpoints — VPC resolver forwards certain zones (e.g. corp.internal) to on-prem DNS
  • Resolver rules — conditional forwarding per zone
  • Resolver Query Logging — every DNS query from the VPC, written to CloudWatch Logs / S3 / Firehose

This is how you build hybrid DNS — bidirectional resolution between AWS and on-prem.

Private DNS for service endpoints

Many AWS services have a public DNS name. When you use VPC Interface Endpoints, enabling Private DNS for the endpoint rewrites the service’s public name to resolve to the endpoint’s private IP inside the VPC — no code change needed.

This interaction lives entirely inside the VPC resolver and doesn’t show up in any hosted zone.

DNSSEC

Route 53 supports DNSSEC signing for public hosted zones:

  • KSK stored in KMS (customer-managed CMK in us-east-1)
  • ZSK managed by Route 53
  • Parent zone must have matching DS record (for .com, registrar uploads it)

DNSSEC validation (resolver-side) is not yet on by default in Route 53 Resolver — configurable per VPC.

Common pitfalls

  1. CNAME at apex — use an Alias record to the AWS target, not a CNAME.
  2. TTL too high during cutovers. Lower TTLs hours before a change so clients pick up the new answer quickly.
  3. Health-check target unreachable from AWS probers. Ensure SGs/NACLs on the target allow the Route 53 health-checker IPs (service: ROUTE53_HEALTHCHECKS).
  4. Mixing policy types under the same record name — not allowed; all siblings must share a policy.
  5. Private hosted zone association — a PHZ is useless unless associated with the right VPCs. Cross-account association is possible but needs explicit API calls.
  6. dig @VPC_DNS doesn’t work from outside a VPC. The resolver is only reachable from inside its VPC. Inbound Resolver endpoints expose it deliberately.
  7. Latency-based records are geolocated by resolver, not end-user. Users behind a remote recursive resolver (Google 8.8.8.8, Cloudflare 1.1.1.1) may route suboptimally. EDNS Client Subnet helps; not always honoured.

Mental model for a network engineer

  • Authoritative DNS with AWS-specific ergonomics (Alias records, policies, health checks)
  • VPC resolver = the recursive resolver your instances use; it’s the seam where Route 53 meets the rest of the internet
  • Route 53 Resolver endpoints = the “forwarding” you’d otherwise build with BIND/Unbound in hybrid setups
  • Health check + Failover = managed VRRP-ish at the DNS layer, but works globally and at L7

See also