Site-to-Site VPN vs Direct Connect

Two ways to connect on-prem (or another cloud) to AWS. One rides the public internet with IPsec; the other is a physical private circuit into an AWS router. They solve different problems — and production-grade hybrid often uses both.

Quick verdict

NeedChoice
Fast to turn up, tolerant of internet jitter, cost-sensitiveSite-to-Site VPN
Consistent throughput, predictable latency, high volumeDirect Connect
”Never goes down”Direct Connect + VPN backup
Compliance requires not touching the public internetDirect Connect (but note: DX alone is unencrypted — add MACsec or IPsec-over-DX)

Side-by-side

AspectSite-to-Site VPNDirect Connect (DX)
TransportIPsec tunnel over the public internetDedicated fibre from your DC to an AWS DX location
Setup timeMinutesWeeks-to-months (ordering circuit, cross-connect, physical)
EncryptionBuilt-in (IPsec)None by default — add MACsec (L2) or IPsec-over-DX
ThroughputUp to ~1.25 Gbps per tunnel aggregate; scale with multiple tunnels + ECMP1, 10, 100 Gbps dedicated; sub-1G as hosted connections via partner
Latency / jitterInternet-variableDeterministic (your provider’s SLA)
PricingPer-hour + data transfer outPort-hour + data transfer out (DX egress is cheaper than internet egress)
Redundancy modelTwo tunnels per connection (automatic)Need 2 ports, ideally at 2 DX locations
BGPSupported (dynamic) or staticBGP required on private/transit VIF
Terminates onVGW or TGWPrivate VIF → VGW/DXGW/TGW; Transit VIF → DXGW → TGW

The VPN side

A Site-to-Site VPN is an IPsec tunnel between an on-prem customer gateway (your firewall/router) and an AWS managed endpoint:

  • Terminates on a Virtual Private Gateway (VGW) attached to a single VPC, or on a Transit Gateway (scales to many VPCs)
  • AWS provisions 2 tunnels per VPN in different AWS AZs. Redundancy is baked in.
  • Routing: static (explicit prefixes) or BGP (dynamic, preferred at scale). BGP also runs the per-tunnel ECMP.
  • Maximum throughput is per-tunnel — to exceed one tunnel’s limit, use multiple VPN connections with ECMP on TGW (active/active).

When VPN is right

  • Bootstrapping a new hybrid setup
  • Sites that don’t move much data
  • Disaster-recovery backup (below)
  • Multi-vendor SD-WAN overlays terminating into TGW Connect

VPN quirks

  • IPsec rekey causes microsecond drops; BGP smooths over this
  • MTU: default 1500; IPsec overhead → effective MTU 1436. TCP MSS clamp or pmtud matters.
  • Cross-tunnel failover uses BGP; without BGP you’re static-route pinned to one tunnel unless you script failover

The Direct Connect side

DX is a physical, dedicated 802.1Q-trunked fibre link between your equipment and an AWS router (the “DX location” — typically a carrier-neutral facility like Equinix). Once established, you attach one or more Virtual Interfaces (VIFs):

VIF typePurpose
Private VIFReach a VPC — terminates on a VGW or Direct Connect Gateway (DXGW)
Public VIFReach AWS public services (S3, DynamoDB, etc.) with private routing — no internet traversal
Transit VIFTerminates on a DXGW that fronts a TGW — scales to many VPCs, multi-region

BGP is mandatory on every VIF. You get a private AS-path peering with AWS; you advertise your prefixes, AWS advertises VPC/public prefixes.

Connection types

  • Dedicated Connection — 1, 10, or 100 Gbps physical port allocated solely to you. Order directly from AWS.
  • Hosted Connection — fractional (50 Mbps up to 10 Gbps) provided by a DX Partner. Provisioned through the partner’s portal. Faster to get, more partners available.

DX is unencrypted by default

The fibre is private to your carrier, but AWS doesn’t encrypt at L2/L3. If regulatory rules demand encryption:

  • MACsec — L2 encryption on 10/100 Gbps dedicated connections where supported
  • IPsec over DX — run a Site-to-Site VPN inside a Public VIF (effectively encrypted DX). Popular pattern for HIPAA/PCI.

Redundancy & hybrid patterns

Pattern 1 — DX primary, VPN backup

Most common production pattern:

 on-prem router ──DX──▶ DXGW ──▶ TGW ──▶ VPCs
       │
       └───── Internet ──VPN──▶ TGW  ◀── backup path

BGP local-preference / AS-path prepending makes DX the preferred path; VPN takes over automatically if DX drops. Near-zero RTO.

Pattern 2 — DX with two locations

For DX-only HA, provision at two DX locations (e.g. Ashburn + Atlanta) and optionally two routers at each site. AWS recommends this for maximum resiliency (MAX model per the Direct Connect Resiliency Toolkit).

Pattern 3 — VPN-only with multiple connections + ECMP

A single IPsec tunnel caps at ~1.25 Gbps. On TGW with ECMP enabled, multiple VPNs load-balance — a cost-effective way to push a few Gbps without DX.

Pricing mental model

  • VPN: flat hourly charge per VPN attachment + data-transfer-out (at standard internet rates). Cheap to stand up.
  • DX: port-hour per VIF/connection + data-transfer-out at DX rates (noticeably cheaper than internet egress). Upfront cost for the circuit.

At very high egress volumes (~terabytes/month), DX’s cheaper per-GB egress can offset the port fee. That’s the usual business case when DX is chosen for cost rather than performance.

When to choose which

Choose Site-to-Site VPN when:

  • You need connectivity this week
  • Throughput fits within a few hundred Mbps
  • Latency variability is tolerable
  • It’s a backup path, not the primary

Choose Direct Connect when:

  • Sustained high throughput (> 1 Gbps)
  • Jitter-sensitive workloads (VoIP, interactive terminals, replication)
  • Predictable monthly data-transfer bills
  • Compliance mandates non-internet paths
  • Hybrid is permanent architecture, not transitional

Use both when: the workload truly can’t tolerate DX downtime. Circuits do fail; VPN-as-backup is cheap insurance.

Common pitfalls

  1. DX ordered, but no redundant circuit. A single DX is a single point of failure. AWS SLAs only apply to the DX service, not the carrier circuit.
  2. VPN without BGP — static routing doesn’t failover between the two AWS-provided tunnels. Use BGP.
  3. Asymmetric routing between DX (primary) and VPN (backup). Stateful firewalls hate this. Align BGP metrics both directions.
  4. Forgetting DX is unencrypted. Dev team assumes “private circuit = secure”; auditors disagree.
  5. MTU mismatches. DX supports jumbo frames (9001); the on-prem side must agree end-to-end or you get black holes.
  6. VIF ownership and hosted VIFs. On hosted connections, the partner owns the port; VIFs are sliced from that port. Organisational seams sometimes create delays.

Mental model

  • VPN = best-effort overlay over public internet with built-in IPsec. Fast to deploy, good enough for many.
  • DX = a dedicated WAN circuit terminated on AWS — think of it as a leased line into AWS’s backbone.
  • DXGW = the DX transit layer — fronts multiple VPCs / regions behind one DX link.
  • Together = the resiliency story AWS recommends for real production hybrid.

See also