View on GitHub

nikogura.com

Thoughts, opinions, and occasionally rantings of a passionate technologist.

Puppets and Octopi: Why Top-Down Orchestration Hits a Wall

You can do orchestration right. You can do it really, really right. You can have the best runbooks, the cleanest pipelines, the most disciplined team. It doesn’t matter. At a certain scale, things start to go wrong.

This isn’t a skill problem. It’s a physics problem.

Two Models of Getting Things Done

There are fundamentally two ways to manage complex systems at scale. I’ve been thinking about this for years, and the best analogies I’ve found are puppets and octopi.

The Puppet Show

Imperative, action-based orchestration is a puppet show. You have puppet masters pulling strings, and puppets doing what they’re told. The master says “move your left arm,” and the puppet moves its left arm. The master says “walk forward,” and the puppet walks forward. Every action originates from above. Every motion requires a string.

This is how most people think about managing systems, because it’s intuitive. You want something done, you tell something to do it. Shell scripts. Ansible playbooks. kubectl apply. helm install. A CI pipeline that SSHs into a server and runs commands. A human typing things into a terminal. An operator clicking buttons in a UI.

It works. I’m not saying it doesn’t work. For a single puppet, it works beautifully. For five puppets, it works fine. For ten, you’re getting pretty good at this. For fifty, you’re sweating. For five hundred, you’re drowning.

Here’s why: every puppet needs strings. Every string goes to a master. The master has two hands. You can add more masters, but now the masters need to coordinate with each other. “I’m moving puppet 47’s left leg, don’t touch puppet 48 yet because they’re linked.” Now you have masters managing masters. Strings everywhere. The whole thing becomes a nightmare of coordination, timing, and entanglement.

Too many cooks in the kitchen. Not because any individual cook is bad. The problem is the kitchen.

The Octopus

Octopi have brains in their tentacles. Not metaphorically — literally. Each arm has a cluster of neurons that can process sensory input and execute motor commands independently. The central brain sends high-level intent (“grab that crab”), and the tentacle figures out the details on its own. How to reach, how to grip, how to navigate around obstacles — that’s all handled locally.

This is declarative convergence. You state what you want. Autonomous agents figure out how to get there. Nobody is pulling strings. Nobody is issuing step-by-step commands. You declare the desired state, and distributed, independent controllers converge reality toward that declaration.

Puppet (the tool, not the analogy — though the naming here is delightful) works this way. Chef works this way. FluxCD works this way. You write a manifest that says “this package should be installed, this file should have these contents, this service should be running.” An agent on the node reads the manifest, compares it to reality, and fixes whatever doesn’t match. The agent doesn’t need to be told how to install the package. It doesn’t need step-by-step instructions. It knows how. You just tell it what.

The autonomous robot version works the same way. You give a fleet of robots their marching orders: “maintain this perimeter,” “keep this area clean,” “deliver packages to these locations.” Each robot figures out how to accomplish its goal independently. They don’t need a central command issuing turn-by-turn directions. They sense their environment, make decisions locally, and converge toward the stated objective.

Why This Isn’t Just a Preference

I’ve heard every version of “we just need to do orchestration better.” Better tooling. Better automation. Better coordination. More masters. Smarter masters. Masters that manage masters that manage masters.

That’s not a solution. That’s a deeper hole.

The puppet model has a fundamental scaling limit, and it’s not a bug you can fix. It’s the architecture. Every imperative action requires:

  1. A decision maker who knows the current state.
  2. A communication channel to the thing being acted upon.
  3. Correct sequencing relative to every other action happening in the system.
  4. Error handling at the orchestration layer when the action fails.
  5. State tracking at the orchestration layer to know what’s been done and what hasn’t.

Each of these is a coordination point. Each coordination point is a potential failure. The number of coordination points grows combinatorially with the number of things being managed. This is not a linear problem. It’s not even quadratic. Once the number of puppets and masters hits a certain threshold, the system collapses under the weight of its own coordination overhead.

You’ve seen this. You’ve lived this. The deploy pipeline that takes 45 minutes because it has to SSH into 200 nodes sequentially. The Ansible playbook that fails on node 147 and now you have to figure out which of the first 146 nodes got the change and which didn’t. The helm release that partially applied because one API call timed out and now your cluster is in some weird intermediate state that doesn’t match the chart, the values file, or reality.

The declarative model doesn’t have this problem, because the coordination is distributed. Each agent is responsible for its own domain. The Puppet agent on node 147 doesn’t care what’s happening on node 146. The FluxCD kustomize-controller reconciling namespace prod-01 doesn’t care what the kustomize-controller is doing in namespace staging. Each controller reads the desired state, compares it to actual state, and converges. Independently. In parallel. Without coordination.

That’s why it scales. Not because it’s clever. Because the architecture doesn’t require centralized coordination.

The Categorization

Let me be concrete about which is which.

Imperative (Puppets and Puppet Masters)

Declarative (Octopus Tentacles)

The Tangled Strings

I worked at a place once where the deploy process was a 47-step runbook. Not kidding. Forty-seven steps, some with sub-steps, executed by a human following the runbook while on a bridge call with four other teams. Each team owned different steps. The whole thing took two hours on a good day. On a bad day — and there were many bad days — step 23 would fail, and everyone would spend the next hour figuring out whether it was safe to re-run from step 23 or whether they had to roll back to step 1.

This was the puppet show at its logical extreme. A room full of puppet masters, each holding strings to different parts of the system, all trying not to trip over each other’s strings. The runbook was the script. The bridge call was the coordination layer. And it was fragile. Any deviation from the script — a slow API, a network hiccup, a service that wasn’t ready yet — and the whole show stopped while the masters figured out how to untangle.

You know what replaced it? A GitOps controller that watched a git repo and reconciled the cluster to whatever was committed. The “deploy process” became “merge the PR.” The controller did the rest. No bridge call. No runbook. No 47 steps. No tangled strings.

Was the system any less complex? No. The same services existed. The same dependencies existed. The same failure modes existed. But the coordination model changed. Instead of a centralized puppet show, each piece of the system had an autonomous controller that knew how to converge its domain to the desired state. The controllers didn’t need to coordinate with each other. They just did their jobs.

“But We Can Automate the Orchestration”

Yeah, you can. That’s what Ansible Tower is. That’s what Jenkins pipelines are. That’s what every “orchestration platform” promises. “We’ll automate the puppet master!”

Cool. Now you have an automated puppet master. It’s still a puppet master. It still has to know the current state of everything. It still has to sequence actions correctly. It still has to handle partial failures. It still has to coordinate across all the things it manages. You’ve made the puppet master faster, not fundamentally different.

An automated puppet master hits the same wall. It just hits it later. And when it hits the wall, debugging it is harder, because now you have to understand both the system and the automation layer that’s supposed to be managing it.

The octopus doesn’t have this problem. Add another tentacle, it just works. The new tentacle has its own neurons. It reads the intent, observes its environment, and converges. It doesn’t need to coordinate with the other seven tentacles. The central brain doesn’t get more burdened. The system scales because the intelligence is distributed.

The Point

This isn’t about being smarter or more disciplined with your orchestration. This isn’t about better tools or better runbooks or better automation. The limitation is architectural.

Centralized, imperative orchestration requires centralized coordination. Centralized coordination is a bottleneck. Bottlenecks don’t scale. Full stop.

Distributed, declarative convergence pushes intelligence to the edges. Each agent converges its own domain. No centralized coordination required. No bottleneck. Scales as far as you need it to.

If you’re building systems that need to grow, stop hiring more puppet masters. Start building smarter tentacles.