Puppets and Octopi: Why Top-Down Orchestration Hits a Wall

You can do orchestration right. You can do it really, really right. You can have the best runbooks, the cleanest pipelines, the most disciplined team. It doesn’t matter. At a certain scale, things start to go wrong.

This isn’t a skill problem. It’s a physics problem.

Two Models of Getting Things Done

There are fundamentally two ways to manage complex systems at scale. I’ve been thinking about this for years, and the best analogies I’ve found are puppets and octopi.

The Puppet Show

Imperative, action-based orchestration is a puppet show. You have puppet masters pulling strings, and puppets doing what they’re told. The master says “move your left arm,” and the puppet moves its left arm. The master says “walk forward,” and the puppet walks forward. Every action originates from above. Every motion requires a string.

This is how most people think about managing systems, because it’s intuitive. You want something done, you tell something to do it. Shell scripts. Ansible playbooks. kubectl apply. helm install. A CI pipeline that SSHs into a server and runs commands. A human typing things into a terminal. An operator clicking buttons in a UI.

It works. I’m not saying it doesn’t work. For a single puppet, it works beautifully. For five puppets, it works fine. For ten, you’re getting pretty good at this. For fifty, you’re sweating. For five hundred, you’re drowning.

Here’s why: every puppet needs strings. Every string goes to a master. The master has two hands. You can add more masters, but now the masters need to coordinate with each other. “I’m moving puppet 47’s left leg, don’t touch puppet 48 yet because they’re linked.” Now you have masters managing masters. Strings everywhere. The whole thing becomes a nightmare of coordination, timing, and entanglement.

Too many cooks in the kitchen. Not because any individual cook is bad. The problem is the kitchen.

The Octopus

Octopi have brains in their tentacles. Not metaphorically — literally. Each arm has a cluster of neurons that can process sensory input and execute motor commands independently. The central brain sends high-level intent (“grab that crab”), and the tentacle figures out the details on its own. How to reach, how to grip, how to navigate around obstacles — that’s all handled locally.

This is declarative convergence. You state what you want. Autonomous agents figure out how to get there. Nobody is pulling strings. Nobody is issuing step-by-step commands. You declare the desired state, and distributed, independent controllers converge reality toward that declaration.

Puppet (the tool, not the analogy — though the naming here is delightful) works this way. Chef works this way. FluxCD works this way. You write a manifest that says “this package should be installed, this file should have these contents, this service should be running.” An agent on the node reads the manifest, compares it to reality, and fixes whatever doesn’t match. The agent doesn’t need to be told how to install the package. It doesn’t need step-by-step instructions. It knows how. You just tell it what.

The autonomous robot version works the same way. You give a fleet of robots their marching orders: “maintain this perimeter,” “keep this area clean,” “deliver packages to these locations.” Each robot figures out how to accomplish its goal independently. They don’t need a central command issuing turn-by-turn directions. They sense their environment, make decisions locally, and converge toward the stated objective.

Why This Isn’t Just a Preference

I’ve heard every version of “we just need to do orchestration better.” Better tooling. Better automation. Better coordination. More masters. Smarter masters. Masters that manage masters that manage masters.

That’s not a solution. That’s a deeper hole.

The puppet model has a fundamental scaling limit, and it’s not a bug you can fix. It’s the architecture. Every imperative action requires:

A decision maker who knows the current state.
A communication channel to the thing being acted upon.
Correct sequencing relative to every other action happening in the system.
Error handling at the orchestration layer when the action fails.
State tracking at the orchestration layer to know what’s been done and what hasn’t.

Each of these is a coordination point. Each coordination point is a potential failure. The number of coordination points grows combinatorially with the number of things being managed. This is not a linear problem. It’s not even quadratic. Once the number of puppets and masters hits a certain threshold, the system collapses under the weight of its own coordination overhead.

You’ve seen this. You’ve lived this. The deploy pipeline that takes 45 minutes because it has to SSH into 200 nodes sequentially. The Ansible playbook that fails on node 147 and now you have to figure out which of the first 146 nodes got the change and which didn’t. The helm release that partially applied because one API call timed out and now your cluster is in some weird intermediate state that doesn’t match the chart, the values file, or reality.

The declarative model doesn’t have this problem, because the coordination is distributed. Each agent is responsible for its own domain. The Puppet agent on node 147 doesn’t care what’s happening on node 146. The FluxCD kustomize-controller reconciling namespace prod-01 doesn’t care what the kustomize-controller is doing in namespace staging. Each controller reads the desired state, compares it to actual state, and converges. Independently. In parallel. Without coordination.

That’s why it scales. Not because it’s clever. Because the architecture doesn’t require centralized coordination.

The Categorization

Let me be concrete about which is which.

Imperative (Puppets and Puppet Masters)

Shell scripts — You write the steps. They execute in order. If step 5 fails, you decide what happens.
Ansible — You write plays. A control node SSHes into targets and executes tasks in order. It’s automation, and it’s good automation, but it’s a puppet master pulling strings. The control node is the bottleneck, and the playbook is the string.
kubectl apply — A human or pipeline pushes desired state to the API server. One-shot. No convergence loop. If it fails, you run it again. If someone changes something after you apply, nobody notices.
helm install / helm upgrade — Same thing with templates. An external actor renders a chart and pushes the result to the cluster. Once the command exits, helm is done. Drift? Not helm’s problem.
CI/CD pipelines doing deploys — Jenkins, GitHub Actions, whatever. They’re puppet masters. They run a sequence of steps against a target. If the pipeline breaks, the target doesn’t get updated. If someone bypasses the pipeline, nobody knows.

Declarative (Octopus Tentacles)

Puppet — An agent runs on every node. It receives a catalog (desired state), compares it to the node’s actual state, and fixes differences. Every 30 minutes, automatically. No human in the loop.
Chef — Same pattern. The Chef client runs on a schedule, converges the node to the cookbook’s desired state, reports back. Continuous convergence.
FluxCD — Controllers run inside the Kubernetes cluster. They watch a git repository for desired state, compare it to the cluster’s actual state, and fix drift. Continuously. Automatically. If someone kubectl edits something, the controller puts it back.
ArgoCD — Same idea, different architecture. Argo runs a reconciliation loop that compares desired state in git to actual state in the cluster and corrects drift. It’s declarative convergence. Argo has its own scaling challenges due to centralized components (single application controller, shared repo server, Redis cache), but the model is convergence, not orchestration. It’s an octopus tentacle with a bigger brain and a smaller arm — the intelligence is still local to the cluster, still reconciling continuously, still correcting drift without human intervention.
Kubernetes itself — This is the part people miss. The entire Kubernetes control plane is a declarative convergence system. You create a Deployment saying “I want 3 replicas.” The Deployment controller creates a ReplicaSet. The ReplicaSet controller creates Pods. The scheduler assigns Pods to nodes. The kubelet on each node runs the containers. Each controller watches its own domain and converges independently. That’s why Kubernetes scales. Not because it’s complex (it is, painfully so). Because the complexity is distributed across independent controllers, each solving a small piece of the puzzle.

The Tangled Strings

I worked at a place once where the deploy process was a 47-step runbook. Not kidding. Forty-seven steps, some with sub-steps, executed by a human following the runbook while on a bridge call with four other teams. Each team owned different steps. The whole thing took two hours on a good day. On a bad day — and there were many bad days — step 23 would fail, and everyone would spend the next hour figuring out whether it was safe to re-run from step 23 or whether they had to roll back to step 1.

This was the puppet show at its logical extreme. A room full of puppet masters, each holding strings to different parts of the system, all trying not to trip over each other’s strings. The runbook was the script. The bridge call was the coordination layer. And it was fragile. Any deviation from the script — a slow API, a network hiccup, a service that wasn’t ready yet — and the whole show stopped while the masters figured out how to untangle.

You know what replaced it? A GitOps controller that watched a git repo and reconciled the cluster to whatever was committed. The “deploy process” became “merge the PR.” The controller did the rest. No bridge call. No runbook. No 47 steps. No tangled strings.

Was the system any less complex? No. The same services existed. The same dependencies existed. The same failure modes existed. But the coordination model changed. Instead of a centralized puppet show, each piece of the system had an autonomous controller that knew how to converge its domain to the desired state. The controllers didn’t need to coordinate with each other. They just did their jobs.

“But We Can Automate the Orchestration”

Yeah, you can. That’s what Ansible Tower is. That’s what Jenkins pipelines are. That’s what every “orchestration platform” promises. “We’ll automate the puppet master!”

Cool. Now you have an automated puppet master. It’s still a puppet master. It still has to know the current state of everything. It still has to sequence actions correctly. It still has to handle partial failures. It still has to coordinate across all the things it manages. You’ve made the puppet master faster, not fundamentally different.

An automated puppet master hits the same wall. It just hits it later. And when it hits the wall, debugging it is harder, because now you have to understand both the system and the automation layer that’s supposed to be managing it.

The octopus doesn’t have this problem. Add another tentacle, it just works. The new tentacle has its own neurons. It reads the intent, observes its environment, and converges. It doesn’t need to coordinate with the other seven tentacles. The central brain doesn’t get more burdened. The system scales because the intelligence is distributed.

The Point

This isn’t about being smarter or more disciplined with your orchestration. This isn’t about better tools or better runbooks or better automation. The limitation is architectural.

Centralized, imperative orchestration requires centralized coordination. Centralized coordination is a bottleneck. Bottlenecks don’t scale. Full stop.

Distributed, declarative convergence pushes intelligence to the edges. Each agent converges its own domain. No centralized coordination required. No bottleneck. Scales as far as you need it to.

If you’re building systems that need to grow, stop hiring more puppet masters. Start building smarter tentacles.

nikogura.com

Thoughts, opinions, and occasionally rantings of a passionate technologist.