BGP — Border Gateway Protocol
The routing protocol of the internet — and, increasingly, of the data center and the cloud. Path-vector, policy-driven, TCP-based.
Why it exists
Interior routing protocols (OSPF, IS-IS, EIGRP) optimise for fast convergence on a trusted topology. BGP optimises for policy, scale, and trust boundaries between autonomous systems. Two problems IGPs can’t solve:
- Scale — the internet has ~1M+ prefixes. OSPF/IS-IS would crumble.
- Policy — “prefer ISP A for outbound, never transit my AS, don’t accept routes from this peer.” IGPs route by metric; BGP routes by policy.
Core concept — path vector
Each route carries the full list of ASes it has traversed (AS_PATH). Loops are prevented by rejecting any update whose AS_PATH already contains your own ASN. Compare:
| Class | Examples | Decision based on |
|---|---|---|
| Distance-vector | RIP, EIGRP | Hop count / composite metric |
| Link-state | OSPF, IS-IS | Full topology map + SPF |
| Path-vector | BGP | Full AS path + policy attributes |
eBGP vs iBGP
| eBGP | iBGP | |
|---|---|---|
| Between | Different ASes | Same AS |
| TTL | 1 (default) | 255 |
| Admin distance | Low (preferred) | High |
| Next-hop on advertise | Rewritten to self | Preserved (common pitfall) |
| Loop prevention | AS_PATH check | No re-advertisement between iBGP peers |
The “no re-advertisement” rule forces a full mesh of iBGP sessions within an AS — which is what route reflectors and confederations exist to solve.
Path attributes (the policy knobs)
Evaluated top-down in the best-path algorithm:
- Weight (Cisco-proprietary; local to router) — higher wins
- Local Preference — higher wins; propagated inside AS via iBGP
- Locally originated — prefer routes you originated
- AS_PATH length — shorter wins
- Origin — IGP < EGP < Incomplete
- MED (Multi-Exit Discriminator) — lower wins; hint to neighbor AS about preferred entry point
- eBGP over iBGP
- IGP metric to next-hop
- Oldest path / Router ID / Neighbor IP — tiebreakers
Practical rule of thumb:
- Influence outbound traffic →
Local Preference - Influence inbound traffic →
AS_PATH prependingorMED(MED only works with a single neighbor AS)
Session states
Idle → Connect → Active → OpenSent → OpenConfirm → Established
Active is misleading — it means “actively trying to establish a TCP session but not there yet.” Most debugging happens between Active and Established.
Scalability — solving the iBGP mesh
In an AS with N routers, full iBGP mesh = N×(N−1)/2 sessions. Doesn’t scale.
- Route Reflectors (RFC 4456) — one or more RRs re-advertise iBGP learned routes; clients peer only with RRs. Hierarchical.
- Confederations (RFC 5065) — split the AS into sub-ASes with eBGP between them; looks like one AS externally.
Route reflectors win in practice. Confederations are rare outside tier-1 ISPs.
When to use BGP
- Between organisations — the only real answer (ISP peering).
- Inside large data centers — “BGP in the DC” design using unnumbered eBGP to the ToR (Microsoft, CLOS fabrics).
- Cloud hybrid — every site-to-site VPN / Direct Connect / ExpressRoute terminates with BGP.
- SD-WAN overlay
- Not for small enterprise networks — OSPF is simpler and adequate.
BGP in the cloud
- AWS Direct Connect / Site-to-Site VPN — you speak eBGP with AWS’s Virtual Private Gateway or Transit Gateway. AWS is AS 7224 (or 64512–65534 for private).
- Azure ExpressRoute / VPN Gateway — eBGP sessions on both primary and secondary.
- Communities are the usual way to tag routes so the cloud can apply policy (e.g. “don’t advertise to secondary region”).
Security
BGP’s original sin is trust. It believes what peers tell it. Real-world incidents (Pakistan-vs-YouTube 2008, Rostelecom 2017) were BGP hijacks.
Mitigations:
- Prefix filters on every peering — accept only expected prefixes
- Max-prefix limits — drop the session if a peer floods you
- RPKI / ROA — cryptographic origin validation; rejects routes where the origin AS isn’t authorised to announce a prefix
- BGPsec (RFC 8205) — path validation, very low deployment
- MD5 / TCP-AO — peer authentication
Common gotchas
- iBGP next-hop unchanged — the BGP next-hop is the eBGP peer’s IP, which iBGP peers may not be able to reach. Fix with
next-hop-selfon the border router or redistribute the peering links into the IGP. - Synchronisation (legacy, off by default in modern code) — historically required an IGP route to match before BGP would install a route.
- Missing IGP route to the peer IP — BGP TCP session won’t come up. Obvious in hindsight, subtle in practice.
- MED not propagated beyond neighbor AS — by design.
See also
- OSPF Area explained — IGP counterpart
- Routing — cross-cutting concept
- L3