Kubernetes Fundamentals
Kubernetes (k8s) is a control loop. You declare “I want 3 replicas of this container, exposed on port 80, with a memory limit of 512Mi.” Kubernetes continually compares that declaration to what’s actually running and does whatever it takes to close the gap. Every feature — Deployments, Services, autoscalers, operators — is a reconciliation loop built on top of the same primitives. If you understand the control-loop model, the rest is vocabulary.
The 30-second mental model
you declare desired state ──► apiserver ──► etcd
│
▼
controllers
│
│ ┌────── "what's actually running?"
▼ │
scheduler kubelet (on each node)
│ │
└────► node ◄─────┘
│
container runtime (containerd)
│
pods run
- You (or a controller) submit a YAML object to the apiserver.
- It’s stored in etcd.
- Controllers watch the apiserver. They see “desired: 3 pods; actual: 2 pods” → they ask the scheduler to place a new pod.
- The scheduler picks a node.
- The kubelet on that node tells the container runtime to start the pod.
- Forever: observe → diff → act → repeat.
Everything else is a variation on that theme.
The cluster anatomy
Control plane
| Component | Job |
|---|---|
| kube-apiserver | REST API + authn/authz front door. Only component that talks to etcd. |
| etcd | The consistent key-value store holding all cluster state. |
| kube-scheduler | Picks a node for each unscheduled pod. |
| kube-controller-manager | Built-in controllers (Deployment, ReplicaSet, Node, ServiceAccount, …). |
| cloud-controller-manager | Cloud-specific bits (load balancers, routes, disks). |
Data plane (per node)
| Component | Job |
|---|---|
| kubelet | Talks to apiserver, ensures pods on the node match the spec. |
| container runtime | containerd / CRI-O — actually runs containers. |
| kube-proxy | Programs iptables / nftables / IPVS for Service virtual IPs (some CNIs replace it). |
| CNI plugin | Calico / Cilium / Flannel / … — provides pod networking. |
As a user, you don’t manage any of this directly. Managed offerings (EKS, GKE, AKS) run the control plane for you.
The core objects
Pod
The smallest deployable unit. Usually one container but can be multiple containers that must live together (sharing network + storage namespaces).
- Pods have their own IP, reachable from other pods.
- Pods are ephemeral — they get replaced, not updated. You almost never create bare pods; you create a controller that creates pods.
- Multi-container pods implement the sidecar pattern (log shipper next to an app, service mesh proxy, init container that prepares state).
apiVersion: v1
kind: Pod
metadata:
name: demo
spec:
containers:
- name: app
image: nginx:1.27
ports: [{ containerPort: 80 }]Deployment (the workhorse)
Declaratively manages a ReplicaSet, which manages pods. Handles rolling updates and rollbacks.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels: { app: web }
template:
metadata:
labels: { app: web }
spec:
containers:
- name: web
image: nginx:1.27
ports: [{ containerPort: 80 }]
resources:
requests: { cpu: "100m", memory: "128Mi" }
limits: { cpu: "500m", memory: "512Mi" }
readinessProbe:
httpGet: { path: /, port: 80 }
periodSeconds: 5Update the image → Deployment creates a new ReplicaSet with the new spec → rolls pods gradually → old ReplicaSet shrinks to 0.
Other workload controllers
| Kind | Purpose |
|---|---|
| StatefulSet | Pods with stable identity (app-0, app-1, app-2) and per-pod persistent storage. Databases. |
| DaemonSet | One pod per node (log collectors, CNI agents). |
| Job | Run-to-completion task. |
| CronJob | Scheduled Job. |
| ReplicaSet | You rarely create this directly; Deployment manages it. |
Service (the stable front door)
Pods come and go; their IPs change. A Service gives a stable virtual IP + DNS name that load-balances across the pods matching a selector.
apiVersion: v1
kind: Service
metadata:
name: web
spec:
selector: { app: web }
ports:
- port: 80 # service port
targetPort: 80 # container port
type: ClusterIP # defaultService types:
| Type | Reachable from |
|---|---|
ClusterIP (default) | Inside the cluster only |
NodePort | External via <node-ip>:<30000-32767> on every node |
LoadBalancer | Cloud provisions an external LB, sends traffic to NodePorts |
ExternalName | DNS CNAME to an external host — no proxying |
Headless (clusterIP: None) | No virtual IP; DNS returns pod IPs directly (used by StatefulSets) |
In-cluster DNS: <service>.<namespace>.svc.cluster.local. Cross-namespace calls use the FQDN.
Ingress / Gateway API (L7 entry)
Service type=LoadBalancer is one external LB per service — expensive. Ingress (L7) shares one LB across many services, routing by host / path.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web
spec:
ingressClassName: nginx
rules:
- host: web.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port: { number: 80 }
tls:
- hosts: [web.example.com]
secretName: web-tlsAn IngressController (nginx-ingress, Traefik, AWS ALB Controller, Istio) is the pod actually handling HTTP. The Ingress object is its config.
Gateway API is the modern successor — separates roles (platform / app team), supports TCP/UDP/TLS, and better mirrors real L7 use cases. Use it on new clusters.
ConfigMap and Secret
| ConfigMap | Secret | |
|---|---|---|
| Purpose | Non-sensitive config | Sensitive data |
| Storage | etcd, plain | etcd, base64 (not encryption!) |
| At rest encryption | Optional (via KMS plugin) | Enable it |
| Consume as | Env var, file mount, or in-pod API | Same |
apiVersion: v1
kind: ConfigMap
metadata: { name: myapp-config }
data:
LOG_LEVEL: info
CONFIG_YAML: |
feature_x: trueMount into a pod:
envFrom:
- configMapRef: { name: myapp-config }Or as files:
volumes:
- name: config
configMap: { name: myapp-config }
volumeMounts:
- name: config
mountPath: /etc/myappSecrets look the same but use kind: Secret + base64-encoded values. Don’t commit them to git — use GitOps Fundamentals patterns (SOPS / Sealed Secrets / External Secrets).
Namespaces
Logical partitioning of the cluster: kube-system, default, and whatever you create. Most objects are namespaced. Namespaces are not a security boundary by default — they’re an organisational boundary. Use NetworkPolicy + RBAC on top for real tenancy.
Storage
Containers are ephemeral. Persistent storage is modelled by:
- PersistentVolume (PV) — a piece of storage (EBS volume, NFS share, Ceph RBD). Cluster-scoped.
- PersistentVolumeClaim (PVC) — a request for storage by a pod. Namespaced.
- StorageClass — a policy class that dynamically provisions PVs from PVCs (e.g. “gp3” on EKS).
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: data }
spec:
accessModes: [ReadWriteOnce]
resources: { requests: { storage: 20Gi } }
storageClassName: gp3Access modes (what the volume supports):
| Mode | Meaning |
|---|---|
| RWO | One node RW (block volumes, default) |
| ROX | Many nodes RO |
| RWX | Many nodes RW (NFS, EFS, CephFS) |
| RWOP | One pod RW (new; real single-pod lock) |
CSI (Container Storage Interface) is the plugin spec. Every cloud / storage vendor has a CSI driver.
Scheduling knobs
Kubernetes picks nodes for you, but you can influence placement:
| Mechanism | Purpose |
|---|---|
| requests / limits | Minimum/maximum CPU + memory. Requests drive scheduling; limits cap runtime. |
| nodeSelector | Simple label match (disktype=ssd) |
| Affinity / anti-affinity | Richer rules (“don’t co-locate two replicas on the same node”) |
| Taints / tolerations | Nodes “repel” pods unless the pod tolerates the taint (GPU nodes, dedicated tenants) |
| PriorityClass | Preemption policy under contention |
| PodDisruptionBudget | During voluntary disruption (drain, upgrade), keep ≥ N pods available |
Autoscaling
Three separate layers:
| Scaler | Scales | Signal |
|---|---|---|
| HPA (Horizontal Pod Autoscaler) | Replicas of a Deployment | CPU, memory, or custom metrics |
| VPA (Vertical Pod Autoscaler) | requests/limits of a Deployment | Observed usage |
| Cluster Autoscaler / Karpenter | Nodes in the cluster | Unschedulable pods |
HPA is the one you use most. Karpenter (AWS) is replacing Cluster Autoscaler in new clusters because it picks instance types intelligently.
Networking model (the 4 rules)
Kubernetes mandates:
- Every pod gets a unique, routable IP.
- Pods can talk to any other pod without NAT.
- Pods can talk to nodes and vice-versa without NAT.
- The IP a pod sees for itself = the IP others see for it.
How this is implemented is up to the CNI plugin:
| CNI | Approach |
|---|---|
| Flannel | Simple overlay (VXLAN) |
| Calico | BGP peering, native routing; supports NetworkPolicy |
| Cilium | eBPF-based; strong L3-L7 NetworkPolicy + observability; replacing kube-proxy |
| AWS VPC CNI | Pods get real VPC IPs (ENI secondary IPs) |
| Azure CNI / GKE | Similar cloud-native approaches |
NetworkPolicy — east-west firewalling
Default: all pods can talk to all pods. NetworkPolicy changes that — but only if your CNI enforces it.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: db-allow-from-app }
spec:
podSelector: { matchLabels: { app: db } }
policyTypes: [Ingress]
ingress:
- from:
- podSelector: { matchLabels: { app: api } }
ports:
- protocol: TCP
port: 5432Translate: “Only pods with label app=api can reach pods with app=db on TCP 5432.”
RBAC
Kubernetes auth is two phases: authn (who are you? usually a certificate, OIDC token, or ServiceAccount JWT) then authz (what can you do? usually RBAC).
Four kinds:
| Kind | Scope |
|---|---|
Role | Permissions in one namespace |
RoleBinding | Binds a Role to a subject, in one namespace |
ClusterRole | Cluster-wide permissions (e.g. read nodes) |
ClusterRoleBinding | Binds a ClusterRole cluster-wide |
Built-in ClusterRoles like view, edit, admin, cluster-admin are usually enough. Least privilege: bind view everywhere; edit / admin only where needed; never cluster-admin outside break-glass.
kubectl — the daily tool
kubectl get pods -A # all pods, all namespaces
kubectl get all -n myns # everything in a ns
kubectl describe pod web-xxxx # detailed state + events
kubectl logs -f web-xxxx -c app # follow logs (specific container if multi)
kubectl exec -it web-xxxx -- /bin/sh # shell inside
kubectl port-forward svc/web 8080:80 # tunnel from your machine to a service
kubectl apply -f . # apply manifests in dir
kubectl diff -f . # preview changes
kubectl rollout status deploy/web
kubectl rollout undo deploy/web # rollback
kubectl top pod # resource usage (needs metrics-server)
kubectl explain deployment.spec.template # discover API fields
kubectl config get-contexts # switch clusters
kubectl config use-context prodSet KUBECTL_EDITOR, learn one dashboard (Lens, k9s), and aliases save real time.
Packaging: Helm / Kustomize
Raw YAML is fine at small scale; bigger apps need templating:
- Helm — “package manager for Kubernetes.” Charts bundle templated YAML + values. Versioned, releaseable, install/upgrade/rollback. Downside: Go templating is ugly at scale.
- Kustomize — patches + overlays, no templating. Base manifests + per-env overlays. Built into
kubectl apply -k. Cleaner mental model for most use cases.
Many teams: Helm for third-party software (databases, ingress controllers); Kustomize for their own apps.
Operators / Custom Resources
Kubernetes is extensible: define a new kind via a CustomResourceDefinition (CRD), run a controller that reconciles it. This is how ArgoCD, Flux, cert-manager, Prometheus Operator, Postgres operators work. An “operator” is a controller with domain knowledge — it doesn’t just keep replicas up, it does “upgrade this Postgres 14 cluster to 15 correctly.”
This is the extension model. Almost every serious platform is a CRD + controller.
Day-2 concerns
- Upgrades. Control plane first, then nodes (drain + replace). Managed offerings do this for you.
- Observability. Metrics (Prometheus + Grafana), logs (Loki / Elastic / Cloud), traces (Tempo / Jaeger), events (
kubectl get events). See Observability. - Cost control. Resource requests drive billing — wrong-sized requests waste money or cause evictions. Tools: Kubecost, Goldilocks, VPA recommendations.
- Policy. Kyverno / OPA Gatekeeper enforce organisation-wide rules (must set requests, must use signed images, no
:latesttags, etc.).
When NOT to use Kubernetes
- Under ~5 services / one team → Docker Compose or managed PaaS is simpler.
- Single big stateful app (one Postgres) → a VM and good backups beat a StatefulSet + operator in operational simplicity.
- You don’t have people to run it. Even “managed” K8s needs real knowledge. If nobody on the team is on-call for it, lean serverless / PaaS.