Kubernetes Fundamentals

Kubernetes (k8s) is a control loop. You declare “I want 3 replicas of this container, exposed on port 80, with a memory limit of 512Mi.” Kubernetes continually compares that declaration to what’s actually running and does whatever it takes to close the gap. Every feature — Deployments, Services, autoscalers, operators — is a reconciliation loop built on top of the same primitives. If you understand the control-loop model, the rest is vocabulary.

The 30-second mental model

   you declare desired state ──► apiserver ──► etcd
                                       │
                                       ▼
                                   controllers
                                       │
                                       │  ┌────── "what's actually running?"
                                       ▼  │
                                   scheduler          kubelet (on each node)
                                       │                   │
                                       └────►  node  ◄─────┘
                                                 │
                                          container runtime (containerd)
                                                 │
                                              pods run
  1. You (or a controller) submit a YAML object to the apiserver.
  2. It’s stored in etcd.
  3. Controllers watch the apiserver. They see “desired: 3 pods; actual: 2 pods” → they ask the scheduler to place a new pod.
  4. The scheduler picks a node.
  5. The kubelet on that node tells the container runtime to start the pod.
  6. Forever: observe → diff → act → repeat.

Everything else is a variation on that theme.

The cluster anatomy

Control plane

ComponentJob
kube-apiserverREST API + authn/authz front door. Only component that talks to etcd.
etcdThe consistent key-value store holding all cluster state.
kube-schedulerPicks a node for each unscheduled pod.
kube-controller-managerBuilt-in controllers (Deployment, ReplicaSet, Node, ServiceAccount, …).
cloud-controller-managerCloud-specific bits (load balancers, routes, disks).

Data plane (per node)

ComponentJob
kubeletTalks to apiserver, ensures pods on the node match the spec.
container runtimecontainerd / CRI-O — actually runs containers.
kube-proxyPrograms iptables / nftables / IPVS for Service virtual IPs (some CNIs replace it).
CNI pluginCalico / Cilium / Flannel / … — provides pod networking.

As a user, you don’t manage any of this directly. Managed offerings (EKS, GKE, AKS) run the control plane for you.

The core objects

Pod

The smallest deployable unit. Usually one container but can be multiple containers that must live together (sharing network + storage namespaces).

  • Pods have their own IP, reachable from other pods.
  • Pods are ephemeral — they get replaced, not updated. You almost never create bare pods; you create a controller that creates pods.
  • Multi-container pods implement the sidecar pattern (log shipper next to an app, service mesh proxy, init container that prepares state).
apiVersion: v1
kind: Pod
metadata:
  name: demo
spec:
  containers:
    - name: app
      image: nginx:1.27
      ports: [{ containerPort: 80 }]

Deployment (the workhorse)

Declaratively manages a ReplicaSet, which manages pods. Handles rolling updates and rollbacks.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels: { app: web }
  template:
    metadata:
      labels: { app: web }
    spec:
      containers:
        - name: web
          image: nginx:1.27
          ports: [{ containerPort: 80 }]
          resources:
            requests: { cpu: "100m", memory: "128Mi" }
            limits:   { cpu: "500m", memory: "512Mi" }
          readinessProbe:
            httpGet: { path: /, port: 80 }
            periodSeconds: 5

Update the image → Deployment creates a new ReplicaSet with the new spec → rolls pods gradually → old ReplicaSet shrinks to 0.

Other workload controllers

KindPurpose
StatefulSetPods with stable identity (app-0, app-1, app-2) and per-pod persistent storage. Databases.
DaemonSetOne pod per node (log collectors, CNI agents).
JobRun-to-completion task.
CronJobScheduled Job.
ReplicaSetYou rarely create this directly; Deployment manages it.

Service (the stable front door)

Pods come and go; their IPs change. A Service gives a stable virtual IP + DNS name that load-balances across the pods matching a selector.

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector: { app: web }
  ports:
    - port: 80            # service port
      targetPort: 80      # container port
  type: ClusterIP         # default

Service types:

TypeReachable from
ClusterIP (default)Inside the cluster only
NodePortExternal via <node-ip>:<30000-32767> on every node
LoadBalancerCloud provisions an external LB, sends traffic to NodePorts
ExternalNameDNS CNAME to an external host — no proxying
Headless (clusterIP: None)No virtual IP; DNS returns pod IPs directly (used by StatefulSets)

In-cluster DNS: <service>.<namespace>.svc.cluster.local. Cross-namespace calls use the FQDN.

Ingress / Gateway API (L7 entry)

Service type=LoadBalancer is one external LB per service — expensive. Ingress (L7) shares one LB across many services, routing by host / path.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
spec:
  ingressClassName: nginx
  rules:
    - host: web.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port: { number: 80 }
  tls:
    - hosts: [web.example.com]
      secretName: web-tls

An IngressController (nginx-ingress, Traefik, AWS ALB Controller, Istio) is the pod actually handling HTTP. The Ingress object is its config.

Gateway API is the modern successor — separates roles (platform / app team), supports TCP/UDP/TLS, and better mirrors real L7 use cases. Use it on new clusters.

ConfigMap and Secret

ConfigMapSecret
PurposeNon-sensitive configSensitive data
Storageetcd, plainetcd, base64 (not encryption!)
At rest encryptionOptional (via KMS plugin)Enable it
Consume asEnv var, file mount, or in-pod APISame
apiVersion: v1
kind: ConfigMap
metadata: { name: myapp-config }
data:
  LOG_LEVEL: info
  CONFIG_YAML: |
    feature_x: true

Mount into a pod:

envFrom:
  - configMapRef: { name: myapp-config }

Or as files:

volumes:
  - name: config
    configMap: { name: myapp-config }
volumeMounts:
  - name: config
    mountPath: /etc/myapp

Secrets look the same but use kind: Secret + base64-encoded values. Don’t commit them to git — use GitOps Fundamentals patterns (SOPS / Sealed Secrets / External Secrets).

Namespaces

Logical partitioning of the cluster: kube-system, default, and whatever you create. Most objects are namespaced. Namespaces are not a security boundary by default — they’re an organisational boundary. Use NetworkPolicy + RBAC on top for real tenancy.

Storage

Containers are ephemeral. Persistent storage is modelled by:

  • PersistentVolume (PV) — a piece of storage (EBS volume, NFS share, Ceph RBD). Cluster-scoped.
  • PersistentVolumeClaim (PVC) — a request for storage by a pod. Namespaced.
  • StorageClass — a policy class that dynamically provisions PVs from PVCs (e.g. “gp3” on EKS).
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: data }
spec:
  accessModes: [ReadWriteOnce]
  resources: { requests: { storage: 20Gi } }
  storageClassName: gp3

Access modes (what the volume supports):

ModeMeaning
RWOOne node RW (block volumes, default)
ROXMany nodes RO
RWXMany nodes RW (NFS, EFS, CephFS)
RWOPOne pod RW (new; real single-pod lock)

CSI (Container Storage Interface) is the plugin spec. Every cloud / storage vendor has a CSI driver.

Scheduling knobs

Kubernetes picks nodes for you, but you can influence placement:

MechanismPurpose
requests / limitsMinimum/maximum CPU + memory. Requests drive scheduling; limits cap runtime.
nodeSelectorSimple label match (disktype=ssd)
Affinity / anti-affinityRicher rules (“don’t co-locate two replicas on the same node”)
Taints / tolerationsNodes “repel” pods unless the pod tolerates the taint (GPU nodes, dedicated tenants)
PriorityClassPreemption policy under contention
PodDisruptionBudgetDuring voluntary disruption (drain, upgrade), keep ≥ N pods available

Autoscaling

Three separate layers:

ScalerScalesSignal
HPA (Horizontal Pod Autoscaler)Replicas of a DeploymentCPU, memory, or custom metrics
VPA (Vertical Pod Autoscaler)requests/limits of a DeploymentObserved usage
Cluster Autoscaler / KarpenterNodes in the clusterUnschedulable pods

HPA is the one you use most. Karpenter (AWS) is replacing Cluster Autoscaler in new clusters because it picks instance types intelligently.

Networking model (the 4 rules)

Kubernetes mandates:

  1. Every pod gets a unique, routable IP.
  2. Pods can talk to any other pod without NAT.
  3. Pods can talk to nodes and vice-versa without NAT.
  4. The IP a pod sees for itself = the IP others see for it.

How this is implemented is up to the CNI plugin:

CNIApproach
FlannelSimple overlay (VXLAN)
CalicoBGP peering, native routing; supports NetworkPolicy
CiliumeBPF-based; strong L3-L7 NetworkPolicy + observability; replacing kube-proxy
AWS VPC CNIPods get real VPC IPs (ENI secondary IPs)
Azure CNI / GKESimilar cloud-native approaches

NetworkPolicy — east-west firewalling

Default: all pods can talk to all pods. NetworkPolicy changes that — but only if your CNI enforces it.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: db-allow-from-app }
spec:
  podSelector: { matchLabels: { app: db } }
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector: { matchLabels: { app: api } }
      ports:
        - protocol: TCP
          port: 5432

Translate: “Only pods with label app=api can reach pods with app=db on TCP 5432.”

RBAC

Kubernetes auth is two phases: authn (who are you? usually a certificate, OIDC token, or ServiceAccount JWT) then authz (what can you do? usually RBAC).

Four kinds:

KindScope
RolePermissions in one namespace
RoleBindingBinds a Role to a subject, in one namespace
ClusterRoleCluster-wide permissions (e.g. read nodes)
ClusterRoleBindingBinds a ClusterRole cluster-wide

Built-in ClusterRoles like view, edit, admin, cluster-admin are usually enough. Least privilege: bind view everywhere; edit / admin only where needed; never cluster-admin outside break-glass.

kubectl — the daily tool

kubectl get pods -A                       # all pods, all namespaces
kubectl get all -n myns                    # everything in a ns
kubectl describe pod web-xxxx              # detailed state + events
kubectl logs -f web-xxxx -c app            # follow logs (specific container if multi)
kubectl exec -it web-xxxx -- /bin/sh       # shell inside
kubectl port-forward svc/web 8080:80       # tunnel from your machine to a service
kubectl apply -f .                         # apply manifests in dir
kubectl diff -f .                          # preview changes
kubectl rollout status deploy/web
kubectl rollout undo deploy/web            # rollback
kubectl top pod                            # resource usage (needs metrics-server)
kubectl explain deployment.spec.template   # discover API fields
kubectl config get-contexts                # switch clusters
kubectl config use-context prod

Set KUBECTL_EDITOR, learn one dashboard (Lens, k9s), and aliases save real time.

Packaging: Helm / Kustomize

Raw YAML is fine at small scale; bigger apps need templating:

  • Helm — “package manager for Kubernetes.” Charts bundle templated YAML + values. Versioned, releaseable, install/upgrade/rollback. Downside: Go templating is ugly at scale.
  • Kustomize — patches + overlays, no templating. Base manifests + per-env overlays. Built into kubectl apply -k. Cleaner mental model for most use cases.

Many teams: Helm for third-party software (databases, ingress controllers); Kustomize for their own apps.

Operators / Custom Resources

Kubernetes is extensible: define a new kind via a CustomResourceDefinition (CRD), run a controller that reconciles it. This is how ArgoCD, Flux, cert-manager, Prometheus Operator, Postgres operators work. An “operator” is a controller with domain knowledge — it doesn’t just keep replicas up, it does “upgrade this Postgres 14 cluster to 15 correctly.”

This is the extension model. Almost every serious platform is a CRD + controller.

Day-2 concerns

  • Upgrades. Control plane first, then nodes (drain + replace). Managed offerings do this for you.
  • Observability. Metrics (Prometheus + Grafana), logs (Loki / Elastic / Cloud), traces (Tempo / Jaeger), events (kubectl get events). See Observability.
  • Cost control. Resource requests drive billing — wrong-sized requests waste money or cause evictions. Tools: Kubecost, Goldilocks, VPA recommendations.
  • Policy. Kyverno / OPA Gatekeeper enforce organisation-wide rules (must set requests, must use signed images, no :latest tags, etc.).

When NOT to use Kubernetes

  • Under ~5 services / one team → Docker Compose or managed PaaS is simpler.
  • Single big stateful app (one Postgres) → a VM and good backups beat a StatefulSet + operator in operational simplicity.
  • You don’t have people to run it. Even “managed” K8s needs real knowledge. If nobody on the team is on-call for it, lean serverless / PaaS.

See also