DevOps Fundamentals

DevOps is not a job title, a tool, or a team. It’s a set of practices for shortening the feedback loop between “we want a change” and “the change is running in production, working.” Tools and teams exist to support the practice — they are not the practice itself.

The problem DevOps exists to solve

Before DevOps, two teams with competing incentives:

Dev rewarded for shipping features fast → writes code and throws it over the wall.
Ops rewarded for uptime → resists every change because every change is risk.

Result: long release cycles (months), ticket ping-pong, blame games after every outage, production state no one can reproduce. Everybody was doing their job and the overall system was broken.

DevOps answers: tear down the wall. Shared goal (value delivered to users), shared tooling (code in git, infra as code, CI/CD), shared responsibility (you build it, you run it).

The three ways (from “The Phoenix Project”)

A useful mental model:

Flow — optimise left-to-right, from idea to production. Remove handoffs, reduce batch size, make the pipeline visible.
Feedback — shorten feedback loops. Fast tests. Monitoring that alerts early. Users get fixes in hours, not weeks.
Continuous learning — blameless postmortems, chaos engineering, game days, experimentation as a first-class activity.

Every DevOps practice is fundamentally one of these three.

What DevOps is (in practice)

A working DevOps setup usually includes all of these:

1. Version control for everything

Not just application code. Everything diffable in git:

Application code
Infrastructure as code (Automation-IaC, Terraform, Ansible)
CI/CD pipeline definitions (.github/workflows, .gitlab-ci.yml, Jenkinsfile)
Kubernetes manifests
Documentation, runbooks, this wiki
Config files (under /etc/ in a repo, applied by Ansible)

If it’s not in git, it doesn’t exist — because you can’t diff, review, rollback, or audit it.

2. CI — continuous integration

Every commit runs an automated pipeline: build, lint, test, security scan. Fast (< 10 min), reliable (no flaky tests), and mandatory (can’t merge if red). See CI-CD Fundamentals.

3. CD — continuous delivery / deployment

Delivery: every commit produces a deployable artifact; humans decide when to release. Deployment: every commit that passes CI goes to production automatically.

Both require reliable tests, automated rollback, and feature flags (to decouple “deploy” from “release”).

4. Infrastructure as Code

Servers, networks, databases, DNS records — defined in text files, applied by tools (Ansible Fundamentals, Terraform, Pulumi). Cattle, not pets: any machine is disposable because recreating it is terraform apply away.

See IaC Fundamentals and Automation-IaC.

5. Observability

You don’t know your system unless you can see it. Metrics (Prometheus), logs (Loki, ELK, Cloud Logging), traces (Jaeger, Tempo, Datadog). See Observability.

The rule: if it alerts, there’s a runbook. If there’s no runbook, the alert is noise.

6. “You build it, you run it”

Werner Vogels’ famous line. Developers carry pagers for their services. It aligns incentives — sloppy code wakes you up at 3 AM, so you write better code.

Not every org can do this; at minimum, dev and ops share a Slack channel and one on-call rota.

Tooling landscape (the short version)

Slice	Examples
Version control	Git (+ GitHub / GitLab / Bitbucket)
CI/CD	GitHub Actions, GitLab CI, Jenkins, CircleCI, Argo Workflows
IaC	Terraform / OpenTofu, Ansible, Pulumi, CloudFormation, Bicep
Containers	Docker, Podman, containerd
Orchestration	Kubernetes (+ k3s, OpenShift, EKS/GKE/AKS)
GitOps	ArgoCD, Flux
Templating / packaging	Helm, Kustomize
Artifact registries	Docker Hub, ECR / GCR / ACR, Artifactory, Nexus
Secrets	Vault, cloud KMS, SOPS, sealed-secrets — see Secrets Management
Monitoring	Prometheus + Grafana, Datadog, New Relic, CloudWatch
Logging	Loki, Elastic, Splunk, Cloud Logging
Tracing	Jaeger, Tempo, Honeycomb, OTEL
Alerting / paging	PagerDuty, Opsgenie, Alertmanager
Feature flags	LaunchDarkly, Unleash, GrowthBook
Chaos	Chaos Monkey, Litmus, Gremlin

Don’t learn them all. Learn one per slice well; the concepts transfer.

DORA metrics — how to tell if you’re “doing DevOps”

The DORA research (now annual “State of DevOps” report) distilled performance into four metrics:

Metric	What it measures	Elite target
Deployment frequency	How often you ship	On-demand, multiple per day
Lead time for changes	Commit → production	Under 1 hour
Change failure rate	% deploys causing problems	0–15 %
Time to restore service	Outage → resolution	Under 1 hour

A fifth was added later: reliability (SLOs met).

These four together beat any single metric, because they trade off: you can ship fast with a high failure rate (cheating) or have rock-solid changes by never shipping (cheating in the other direction). All four are equally watched.

GitOps — the purest form

GitOps is DevOps with a specific discipline: a git repo is the declared state of the system, and a controller (ArgoCD, Flux) continuously reconciles actual state to match.

Properties:

No kubectl apply from laptops. Ever.
Changes flow through PR → merge → controller.
Drift is visible (diff of declared vs actual) and auto-corrected.
Rollback is git revert.

See GitOps Fundamentals. This is where Idempotence and Declarative vs Imperative Automation become critical — GitOps only works because the controllers converge on declared state.

Deployment strategies

How do you actually put new code in production without downtime?

Strategy	How it works	Rollback
Recreate	Stop v1, start v2	Slow, with downtime
Rolling	Replace instances one-by-one	Revert & roll back the same way
Blue/Green	Two full environments; swap the load balancer	Swap back instantly
Canary	Send 1% → 5% → 25% → 100% of traffic to v2	Stop the canary
Shadow / dark launch	v2 receives traffic but responses are thrown away	Turn off shadowing
Feature flags	Deploy v2 with features off; toggle per user / cohort	Flip flag off

Modern prod is usually: rolling for infra (managed by k8s), canary + feature flags for features.

Culture, not tools

A common failure mode: a company buys Jenkins + Docker + Terraform, doesn’t change incentives or structures, and then is “doing DevOps.” They aren’t.

The things that actually move the needle:

Small batches. Merge small PRs often. Huge PRs = huge risk = slow review = slow feedback.
Blameless postmortems. People aren’t the cause — systems are. Write postmortems that change systems.
Stop-the-line culture. Anyone can halt a deploy or escalate a risk without fear.
On-call sanity. Pages should be rare, actionable, and owned. 3-AM pages with no runbook is a system failure, not a human one.
Automation as discipline. If you did it twice manually, the third time is a script.

Where DevOps ends and SRE / Platform begins

Three adjacent words that confuse people:

DevOps — the cultural practice and set of engineering techniques.
SRE (Site Reliability Engineering) — Google’s implementation of DevOps. Strong emphasis on SLOs, error budgets, toil reduction, running production as a software problem.
Platform Engineering — building internal paved-road tooling so product teams don’t each reinvent CI/CD / infra. “Platform as a product.”

They overlap heavily. Tell me what you do day-to-day and I can’t usually tell which title is on the door.

The honest reality

DevOps can absolutely regress. Signs:

Shipping fast but the change failure rate climbs → tests are theatre, not covering real risk.
Every service runs differently because “we had a reason” each time → you’ve lost standardisation; platform engineering is needed.
On-call burnout → alert volume is too high, runbooks don’t exist, or failure modes aren’t being fixed.
“DevOps team” sitting between Dev and Ops → congratulations, you invented a new wall.

DevOps is a direction, not a destination. You’re always one organisational change or one technology shift from regressing.

IT Knowledge DB

Explorer

DevOps Fundamentals

DevOps Fundamentals

The problem DevOps exists to solve

The three ways (from “The Phoenix Project”)

What DevOps is (in practice)

1. Version control for everything

2. CI — continuous integration

3. CD — continuous delivery / deployment

4. Infrastructure as Code

5. Observability

6. “You build it, you run it”

Tooling landscape (the short version)

DORA metrics — how to tell if you’re “doing DevOps”

GitOps — the purest form

Deployment strategies

Culture, not tools

Where DevOps ends and SRE / Platform begins

The honest reality

See also

Graph View

Table of Contents

Backlinks