← all posts

Writing a Kubernetes operator in Go

Controllers, reconcile loops, and the CRD lifecycle — explained by building one from scratch.

Everyone tells you operators are “just a reconcile loop.” That’s true, and also unhelpful the first time you stare at a blank main.go. So let’s build a real one — a controller that keeps a fixed number of pods alive — and watch where the abstractions leak.

The reconcile contract

The whole model is one function. Kubernetes hands you a request, you read the world, you nudge it toward the desired state, you return. No state machine, no queue to manage — the loop is the framework’s job.

main.go
go
func reconcile(ctx context.Context, req Request) error {
    // 1. read the desired state from the CRD
    app, err := r.Get(ctx, req.Name)
    if err != nil {
        return client.IgnoreNotFound(err)
    }
    // 2. converge: create pods until we hit the target
    for i := live(app); i < app.Spec.Replicas; i++ {
        r.Create(ctx, podFor(app))
    }
    return nil
}

The subtle part isn’t the code — it’s that reconcile must be idempotent. It runs again on every change, every resync, every restart. Write it like it’s the first time, every time.

A good operator is boring. It reads the world, makes one small correction, and goes back to sleep.

Watching it work

Apply the CRD, scale the spec, and tail the logs. The controller does exactly one thing per event:

~/op — zsh
~/op kubectl apply -f app.yaml
✓ app.graditya.dev/demo created
~/op kubectl scale app/demo --replicas=3
reconcile: live=0 want=3 → creating 3 pods
✓ converged in 240ms
k9s showing the operator and its pods
k9s — watching the operator bring pods up to the desired count.

That’s the entire job. Everything else — finalizers, status conditions, owner references — is detail you add once the loop is solid.

// up next