Blog Archive

Gambling on Failure

Most people gamble on success — they assume the thing will work, and they're genuinely surprised when it doesn't. A tiny, birdlike kung fu master taught me to gamble on failure instead. Expect every move to be blocked, and win anyway. It turns out to be the same discipline that keeps systems alive at 2 AM.

DDCRI: Declarative, Deterministic, Continuously Reconciling Infrastructure

What's in git is what's in your infrastructure — or alarms are sounding. DDCRI is the discipline that makes that sentence literally true — FluxCD, Kustomize, Crossplane, and Upjet reconciling a control repository continuously, with drift wired up as a pageable condition. A canonical, example-driven walkthrough.

Stop Holding Out for a Hero

Incident response is either an engineering discipline — measured, quantified, repeatable, owned, evaluated — or it is a craft a few heroes practice and nobody else can duplicate. Heroes are great. You shouldn't need them, and you shouldn't bet the company on still having them.

Don't Paint Yourself Into a Corner

Larry Wall built Perl around a principle: no unnecessary limitations. Most of the limitations we build into our own code aren't necessary either — they're laziness wearing the costume of caution, and every one is a wet patch of floor between you and the door. Stop boxing in your future self.

Most Infrastructure as Code Is Broken — and Reconciliation Is Only Half the Reason

Run terraform plan against infrastructure nobody has touched in a month and watch it propose changes. That drift is the absence of a reconciliation loop. But the missing loop is only half of why most Infrastructure as Code is broken — and bolting a loop onto the other half just gets you to broken faster.

There's More Than One Way to Get Observability Right

The specialize-versus-unify argument feels like a religious war. It isn't. Both sides are right — they're answering different questions. There are several ways to get observability right. The way to get it wrong is to never ask which one you're building for.

Continuous Acceptance Tests

An acceptance test run once before deploy proves the data was correct for one instant. The data does not stay correct because the deploy was green. Stop retiring your best test the moment it passes. Run it forever.

Put Dex In Front of Google OAuth

Google OAuth has two surprises that make every internal-service auth story uglier than it should be. The standard workaround involves domain-wide delegation and a service account JSON key shipped to every application that wants group-based authorization. There is a much better answer that doesn't require any of that.