Blog Archive
Gambling on Failure
Most people gamble on success — they assume the thing will work, and they're genuinely surprised when it doesn't. A tiny, birdlike kung fu master taught me to gamble on failure instead. Expect every move to be blocked, and win anyway. It turns out to be the same discipline that keeps systems alive at 2 AM.
Read Post
DDCRI: Declarative, Deterministic, Continuously Reconciling Infrastructure
What's in git is what's in your infrastructure — or alarms are sounding. DDCRI is the discipline that makes that sentence literally true — FluxCD, Kustomize, Crossplane, and Upjet reconciling a control repository continuously, with drift wired up as a pageable condition. A canonical, example-driven walkthrough.
Read Post
Stop Holding Out for a Hero
Incident response is either an engineering discipline — measured, quantified, repeatable, owned, evaluated — or it is a craft a few heroes practice and nobody else can duplicate. Heroes are great. You shouldn't need them, and you shouldn't bet the company on still having them.
Read Post
Don't Paint Yourself Into a Corner
Larry Wall built Perl around a principle: no unnecessary limitations. Most of the limitations we build into our own code aren't necessary either — they're laziness wearing the costume of caution, and every one is a wet patch of floor between you and the door. Stop boxing in your future self.
Read Post
Most Infrastructure as Code Is Broken — and Reconciliation Is Only Half the Reason
Run terraform plan against infrastructure nobody has touched in a month and watch it propose changes. That drift is the absence of a reconciliation loop. But the missing loop is only half of why most Infrastructure as Code is broken — and bolting a loop onto the other half just gets you to broken faster.
Read Post
There's More Than One Way to Get Observability Right
The specialize-versus-unify argument feels like a religious war. It isn't. Both sides are right — they're answering different questions. There are several ways to get observability right. The way to get it wrong is to never ask which one you're building for.
Read Post
Continuous Acceptance Tests
An acceptance test run once before deploy proves the data was correct for one instant. The data does not stay correct because the deploy was green. Stop retiring your best test the moment it passes. Run it forever.
Read Post
Put Dex In Front of Google OAuth
Google OAuth has two surprises that make every internal-service auth story uglier than it should be. The standard workaround involves domain-wide delegation and a service account JSON key shipped to every application that wants group-based authorization. There is a much better answer that doesn't require any of that.
Read Post