Writing a Kubernetes operator in Go
Controllers, reconcile loops, and the CRD lifecycle — explained by building one from scratch.
Everyone tells you operators are “just a reconcile loop.” That’s true, and also unhelpful the first time you stare at a blank main.go. So let’s build a real one — a controller that keeps a fixed number of pods alive — and watch where the abstractions leak.
The reconcile contract
The whole model is one function. Kubernetes hands you a request, you read the world, you nudge it toward the desired state, you return. No state machine, no queue to manage — the loop is the framework’s job.
func reconcile(ctx context.Context, req Request) error {
// 1. read the desired state from the CRD
app, err := r.Get(ctx, req.Name)
if err != nil {
return client.IgnoreNotFound(err)
}
// 2. converge: create pods until we hit the target
for i := live(app); i < app.Spec.Replicas; i++ {
r.Create(ctx, podFor(app))
}
return nil
} The subtle part isn’t the code — it’s that reconcile must be idempotent. It runs again on every change, every resync, every restart. Write it like it’s the first time, every time.
A good operator is boring. It reads the world, makes one small correction, and goes back to sleep.
Watching it work
Apply the CRD, scale the spec, and tail the logs. The controller does exactly one thing per event:
That’s the entire job. Everything else — finalizers, status conditions, owner references — is detail you add once the loop is solid.