Relational vs NoSQL Databases
The fundamental split in the data world. Both are valid, neither is universally better, and the right choice depends on the shape of your data and the questions you ask of it.
Relational (SQL) databases
A relational database stores data in tables with rows and columns. Tables relate to each other via foreign keys. You interrogate the data with SQL — a declarative language where you say what you want, not how to get it.
Examples: PostgreSQL, MySQL/MariaDB, Microsoft SQL Server, Oracle, SQLite.
What makes them “relational”
- Schema-first — you define columns and their types up front. A row must match the schema.
- Relations — the “parent → child” links are first-class, enforced with foreign keys.
- Normalization — breaking data into many small tables to avoid duplication. One customer record, one address record, linked by ID.
- ACID Transactions — all-or-nothing, correctness-first.
- SQL — the common query language that has outlived nearly every competing vision of the future since the 1970s.
Strengths
- Consistency — the data you read is always the data that was most recently committed.
- Joins — combining data from multiple tables in one query. The killer feature of SQL.
- Mature tooling — decades of optimisation, backup tools, replication patterns, tuning knowledge.
Trade-offs
- Scaling writes is hard — scaling up (bigger server) works; scaling out (many servers sharing the write load) is hard because ACID guarantees fight horizontal distribution.
- Schema changes are friction — adding a column to a billion-row table can take hours.
- Not great for non-tabular data — JSON, time-series, graph-like data fit awkwardly (though modern Postgres handles them surprisingly well).
NoSQL databases
“NoSQL” = “Not Only SQL.” It’s a catch-all for everything that doesn’t fit the relational model. There are really four sub-families:
1. Document databases
Store data as self-contained JSON-like documents. Each document can have different fields. No join, no fixed schema.
- Examples: MongoDB, Couchbase, AWS DocumentDB
- Good for: content systems, product catalogs, anything where each record has variable fields
2. Key-value stores
The simplest — a giant dictionary. Key → value. Extremely fast. Rich operations on values sometimes (lists, sets, counters).
- Examples: Redis, Memcached, AWS DynamoDB (partly)
- Good for: caches, session storage, simple lookups, real-time counters
3. Wide-column stores
Like a relational table but with trillions of rows and flexible columns per row. Designed for massive scale.
- Examples: Apache Cassandra, ScyllaDB, Google Bigtable, HBase
- Good for: time-series data, log ingestion, write-heavy workloads at huge scale
4. Graph databases
First-class nodes and edges. Queries traverse relationships efficiently.
- Examples: Neo4j, ArangoDB, Amazon Neptune
- Good for: social graphs, fraud detection, recommendation engines, anything where “how are these things connected?” is the main question
The CAP theorem (the big trade-off)
In a distributed system you can have two of these three, not all three:
- Consistency — every read sees the most recent write
- Availability — every request gets a response
- Partition tolerance — the system keeps working when parts can’t talk to each other
Network partitions are unavoidable in practice, so the real choice is CP vs AP:
- CP systems (most SQL, MongoDB with strong consistency) — prefer to refuse requests rather than serve stale data during a partition
- AP systems (Cassandra, DynamoDB) — keep serving, accept that some reads may return older values temporarily (“eventual consistency”)
Traditional SQL chooses CP by default. Many NoSQL systems choose AP by default.
When to pick which
Pick relational if
- Your data has a clear, stable schema with relations
- You need transactions across multiple rows/tables (ACID Transactions)
- Your queries involve joining multiple entities (“all orders from customers in Georgia with total > $500”)
- You’re not at hyperscale (which is most of us)
Pick NoSQL if
- Your data is document-shaped, variable, or schema-less
- You need to scale writes horizontally across many nodes
- Your access pattern is simple and predictable (get-by-key, time-series append)
- Latency matters more than strong consistency
- You have specific workload: graph, time-series, search, cache
Real-world answer: both
Most production systems use multiple databases:
- Relational for core transactional data (orders, users, accounts)
- Redis/Memcached for cache
- Elasticsearch for search
- Time-series DB (InfluxDB, TimescaleDB) for metrics
- Object storage for blobs
This is called polyglot persistence.
Postgres is eating the world
Worth a special note: PostgreSQL has absorbed most of what the early NoSQL era was about:
- JSON/JSONB columns for document-like data
- Full-text search
- Time-series via TimescaleDB extension
- Geospatial via PostGIS
- Array types, range types, materialised views
For many workloads that ten years ago required specialised NoSQL, Postgres is now “good enough and you already know SQL.” It’s a defensible default choice.
See also
- ACID Transactions
- Database Indexing
- 🖥️ Server Infrastructure MOC — storage and backup considerations