View on GitHub

nikogura.com

Thoughts, opinions, and occasionally rantings of a passionate technologist.

Engineering Standards

Table of Contents


Core Philosophy

Security, reliability, and compliance are non-negotiable. Every line of code is a potential attack vector or compliance violation. Move deliberately, not fast. Test all assumptions.

The Master’s Approach

Excellence is a muscle. The more you exercise it, the stronger it becomes. “Doing it right” gets faster with practice, and faster work becomes better with repetition. That’s mastery: producing master-level work with less effort than it takes an apprentice to pick up the tools.

Write every line of code as if you’ll publish it under your own name. Even if a repo will “never see the light of day,” we owe it to ourselves and our successors to maintain excellence. Git is forever. If your name is on it, make it a lesson in coding excellence.

Code Reusability

Design your code so it can be reused and refactored quickly. This isn’t about perfection—it’s about making it easy for your future self to swap out clunky implementations in 5-10 minutes of free time.

Technical debt doesn’t get paid down due to lack of desire; it persists due to lack of time. Front-load your pain by writing tests and modular code. If refactoring is easy and provable through tests, it will get done. Make refactoring easy.


Golang Best Practices

Go Proverbs

The language’s author said it best: https://go-proverbs.github.io/

Key proverbs we emphasize:

Write Golang as Golang

Golang is interface-oriented, not object-oriented. This is subtle but fundamental. Each language has syntax, but also idiom. Learn both, or you’re missing out on what makes the language worth using.

Keep packages flat. Labyrinthine package trees get complicated quickly. If you really need a different package, make it a totally different module with its own lifecycle in a different repo.

SOLID/MVC Principles in Golang

While SOLID was written for OOP (and Golang isn’t OO), the principles remain the same:

Libraries (Models)

Views

Tests

Code Layout

Follow https://github.com/golang-standards/project-layout

Key points:

Generate standardized repos: Use something like the boilerplate tool to generate recognizable code for yourself or your organization.

The more familiar it looks, the easier it is for anyone to onboard or jump in and lend a hand.

Public vs Private

Default to public unless you can articulate why something needs to be private. Don’t impose restrictions on your future self without good reason.

In Golang:

This is especially important for Golang beginners. Some error messages are esoteric, and words are used in slightly different ways that will bite you.

Building Code

Use standard conventions:

go build  # Produces binary named after directory

Don’t do this:

go build -o main .  # Binary named "main" is meaningless

Seeing /app/main in a container tells you nothing. Seeing /app/my-cool-program provides information.

Generally, a single repository should build a single binary and be buildable with simply go build. If the repo is laid out differently or contains multiple binaries, perhaps you’re being overly clever. Clear is better than clever. Simple is easier to debug in a crisis.


Code Quality & Linting

Golang Linting is Law

golangci-lint is the standard. All Go code must pass golangci-lint with the project’s standardized configuration.

Named Returns Are Mandatory

The namedreturns linter is MANDATORY. (In my world, anyway) This is a custom linter, not included in golangci-lint.

Named returns provide critical information to both engineers and compilers.

Why Named Returns Matter:

// WRONG - Unnamed returns lack context.  You're missing an opportunity to make your code self-documenting here.
func Foo() (string, error) {
    return "", nil
}

// WRONG - Named returns, but returning something different from what's promised - this leads to confusion and surprises.  In a big, complex function you might be returning something unexpected. Better to be explicit than surprised.
func Foo() (output string, err error) {
    return "", nil
}

// RIGHT - Named returns document your intent, and makes the code clear, and easy to review.
func Foo() (output string, err error) {
    output = "bar"
    return output, err
}

Honor the Function Contract:

Common Lint Patterns

1. Globals and Init fuctions. Cobra and Prometheus both make heavy use of global variables and init(). In the case of Prometheus, this is part of what makes Prometheus so easy to implement. We need to allow this to use these common modules. However, if you haven’t thought it through to the level that the Prometheus authors have, globals and init functions are probably something to avoid.

//nolint:gochecknoglobals // Cobra boilerplate 
var rootCmd = &cobra.Command{...}

//nolint:gochecknoinits // Cobra boilerplate
func init() {...}

2. Avoid Nested Closures with Returns

// ANTI-PATTERN  Yes, it works, but it's harder to read and to trace.  Think about who's maintaining this code!
func Outer() (result string, err error) {
    token, err := jwt.Parse(str, func(t *jwt.Token) (interface{}, error) {
        if check { return nil, errors.New("bad") }
        return key, nil
    })
}

// CORRECT - extract to top-level function.  It's just clearer, and clear is better than clever.
func lookupKey(token *jwt.Token, config Config) (key interface{}, err error) {
    return key, err
}

func Outer() (result string, err error) {
    token, err := jwt.Parse(str, func(t *jwt.Token) (interface{}, error) {
        return lookupKey(t, config)
    })
}

3. Reduce Cognitive Complexity

Extract helper functions instead of deeply nested logic:

func validateAudience(claims jwt.MapClaims, expected string) bool {...}
func extractGroups(claims jwt.MapClaims) []string {...}
func validateGroupMembership(userGroups, allowedGroups []string) bool {...}

4. Avoid Inline Error Handling - it’s easier to read, and are you really worried about the extra newline? Sure, in the inline version the variable is scoped to the conditional, but is that really more important than readability? Your functions shouldn’t be that big in any case. Optimize for your future self, who might not be as clever as you feel today.

// WRONG
if err := doSomething(); err != nil {...}

// RIGHT
err := doSomething()
if err != nil {...}

5. Proto Field Access

req.GetSourceBucket()  // CORRECT
req.SourceBucket       // WRONG

Integrating namedreturns into Build Process

The namedreturns linter must be integrated into your make lint target.

Pattern 1: Simple Projects (No Generated Code)

For projects without generated code, use the simple pattern:

# Run golangci-lint with namedreturns
lint:
	@echo "Running namedreturns linter..."
	namedreturns ./...
	@echo "Running golangci-lint..."
	golangci-lint run

Key points:

Pattern 2: Projects with Generated Code

For projects with generated code (protobuf, GraphQL, OpenAPI, etc.), exclude generated packages:

lint:
	@echo "Running namedreturns linter..."
	@for pkg in $(shell go list ./pkg/... ./cmd/... | grep -v 'generated/package/path$$' | grep -v 'another/generated/path$$'); do \
		namedreturns -test=true $$pkg || exit 1; \
	done
	@echo "Running golangci-lint..."
	golangci-lint run --timeout=5m

Exclusion examples:

Key points:


Repository Structure

Standard Dockerfile

Use multi-stage builds with distroless images for minimal attack surface and image size:

FROM golang:1.23.4 AS builder

WORKDIR /app

# Setup Git for SSH access to private repos
RUN git config --global url."[email protected]:".insteadOf "https://github.com/"
RUN mkdir -p ~/.ssh && ssh-keyscan -H github.com >> ~/.ssh/known_hosts
RUN go env -w GOPRIVATE="github.com/<YOUR_ORG>/*"

# Copy dependency files first for layer caching
COPY go.mod go.sum ./
RUN --mount=type=ssh go mod download

# Copy source code
COPY . .

# Build static binary
RUN CGO_ENABLED=0 GOOS=linux go build .

# Use distroless for minimal runtime
FROM gcr.io/distroless/base-debian12:nonroot

# Copy binary with correct ownership
COPY --from=builder --chown=nonroot:nonroot /app/<binary-name> /app/

ENTRYPOINT ["/app/<binary-name>"]

Key points:

Building with SSH:

docker build --ssh default .

Container Consistency

Use the same container for all stages:

Alpine uses Musl-C instead of GlibC. Binaries compiled on Alpine won’t run on Debian. These errors require large amounts of debugging hours. Avoid them entirely by using the same containers for build, test, and deployment (or at least the same OS family).


Test-Driven Development (TDD)

TDD is the law. All new features and changes MUST include test coverage.

If someone finds a bug in your software, and it is not exposed in your test suite, then that is your first problem.

No code should ever be touched without expanding the test suite.

First write a new test or fix an existing test. In TDD this is the “Red” stage. Teach your test suite to recognize the flaw reported by your user.

Next Fix the problem. In TDD, this is the “Green” stage - now you’ve fixed the problem, and the test, which is now passing, lives on in the codebase, forever protecting you from this mistake.

It might not be much, but these tests build up over time, like drops in a bucket. You don’t gain a lot from the first one, but over time, you can ship more, and more complex changes with confidence because what was today’s functional test is now a regression test in the future.

Test Requirements

Test Quality Standards

Running Tests

If someone asks you to implement something and doesn’t mention tests:

Example workflow:

  1. Write Tests for new Feature
  2. Implement feature
  3. Run tests: go test ./...
  4. Run linters: golangci-lint run
  5. Only then is the feature complete

Error Handling

Errors Are Values

Don’t just check errors to satisfy the compiler. Use the value. When constructing error handling routines, make them useful.

Always Do This

thing := "foo"
err := DoSomethingWith(thing)
if err != nil {
    err = errors.Wrapf(err, "failed to do something with %s", thing)
    return err
}

Add context to errors. Make it easy to grep the code for the error message.

Never Do This

Don’t ignore errors:

thing, _ := SomethingThatReturnsThingAndErr()  // WRONG - errors are hidden

Don’t use boilerplate error messages:

const SOME_ERROR = "something went wrong"
return errors.New(SOME_ERROR)  // WRONG - can't grep for it.  You might throw this error at multiple places in multiple files.

Don’t compress error handling:

// WRONG - hard to read
if err := priceConverter.Start(ctx); err != nil {
    return fmt.Errorf("could not start price converter: %w", err)
}

// RIGHT - clear and readable
err := priceConverter.Start(ctx)
if err != nil {
    err = errors.Wrapf(err, "could not start price converter for %s", thing.Name())
    return err
}

Don’t Panic

panic() and recover() are built into Golang, but we should almost never use them. Simply put, there are better mechanisms to handle errors. Most of the time, panicking is a sign of laziness or lack of concern for maintainability.

Especially avoid deferred panic recovery:

// HORRIBLE - don't do this!  You can't easily tell where this got thrown!
defer func() {
    if err := recover(); err != nil {
        logrus.WithField("err", err).Error("panic-discovered")
        sentry.CaptureException(err.(error))
        panic(err)  // Re-panicking is even worse
    }
}()

This gives you no information about where the error came from. In a 800-line function, good luck finding the error point quickly.

Exit with Logs and Return Codes

Services should not exit unless they cannot continue. When you must exit:

Standardized error codes are an eventual goal. In the short term, even exit(1) is preferable to panic or no exit at all.

Don’t leave us hanging. Getting paged for a Kubernetes pod in CrashLoopBackoff state is not how we should learn about problems.

Return Early

Check errors and bail out of functions as soon as you find one you can’t handle. Golang generally avoids else and prefers early returns.


External Connections

Every system making external connections must support:

  1. Rate limits that default to 0 (no limit)
  2. Retries with exponential backoff
  3. Clear logging - we should always know what’s happening
  4. Metrics on:
    • Number of calls by method
    • Number of errors
    • Durations
    • Backoff period
    • All broken down by method
  5. Never exit or panic on failed connection - workload should continue to self-heal

Kubernetes pods entering crashloop due to connection errors is an anti-pattern to avoid at all costs.

Retries and Exponential Backoff

We live in an imperfect world. Code the assumption that things won’t always work into your services.

In Kubernetes, pods are mortal. This means the fundamental unit of work is expected to run, die, and be replaced at any time. A CrashLoopBackoff is not an error in itself. Something stuck in CrashLoopBackoff for too long is.

Assume whatever you’re connecting to may not be there or may not answer. Incorporate retry logic, but endless retries become the cause of further errors. Retries need exponential backoff.


Kubernetes & GitOps

GitOps Principles

GitOps, coined by Weaveworks, is the practice of controlling infrastructure declaratively through Git.

The commands kubectl apply and helm install are not GitOps. If you’re using these commands directly against a cluster, you’re manually managing infrastructure—and losing the audit trail, reproducibility, and drift correction that GitOps provides.

Core tenets:

  1. Git is the single source of truth for desired state
  2. Changes to Git trigger changes to infrastructure
  3. Drift from desired state is automatically corrected
  4. Every change is auditable through commit history

Evaluate Before You Apply

Never apply unevaluated resources to clusters:

# Avoid these patterns
kubectl apply -f https://raw.githubusercontent.com/.../deploy.yaml
helm install my-release some-chart
kubectl apply -k <url>

Instead:

  1. Download and review the manifests locally
  2. Use helm template to render charts, not helm install
  3. Understand how each resource affects your security posture
  4. Commit the evaluated resources to Git
  5. Let a GitOps controller like FluxCD apply them

This isn’t bureaucracy—it’s the difference between knowing what runs in your cluster and hoping for the best.

Control Repositories

Control Repositories contain Infrastructure as Code configuration. The main controls needed:

All of the above is given for free with VCS, so code reviews and pull requests can cause unnecessary burden and slow down delivery velocity.

That’s not to say there shouldn’t be PR’s and code reviews. We use the same tools (Git, etc) for control repositories and code repositories, but we don’t necessarily need to use the tools in the same way.

Much of the time, ceremonies on a control repository can be reduced or abbreviated. It’s not that we don’t want reviews or communication, but we also don’t want to needlessy distract or get into a pattern where people don’t read or understand the changes and just ‘rubber stamp’ approvals.

Kubernetes Resource Approaches

There are 4 main ways of managing resources in Kubernetes:

You’ll encounter all four eventually. You can’t really work in Kubernetes without uderstanding all 4. Among them, Jsonnet is the author’s preferred solution due to its flexibility and resistance to blind application. Having everything explicitly laid out in code that is checked in also makes it really easy for an agent to read/understand/diagnose problems, and better yet, fix them.


Proto Files: Interface Definitions, Not Data Authorities

Proto files define interfaces between services, not authoritative data types. Their primary function is to define message structures for communication, ensuring consistency in API contracts.

Why This Matters

Your application logic should dictate data types—not proto files. Proto messages are transport representations, optimized for serialization. They’re not designed to be the authoritative source of core data structures.

Pitfalls of Using Proto Messages as Core Types

A Better Approach

  1. Define your own structs - Domain-driven structs that fit your application
  2. Convert between internal structs & proto messages - Map when communicating
  3. Keep protos focused on API contracts - Define how services interact, nothing more
  4. Never reuse tags - Changing a field? Use a new tag number

Example:

// WRONG - Proto-centric
import "github.com/myorg/protos/common"

func GetBlockchainID() common.Blockchain {
    return common.Blockchain_ETH  // Hard-coupled to proto
}

// RIGHT - Decoupled
type Blockchain int

const (
    Ethereum Blockchain = 1
    Bitcoin  Blockchain = 2
)

func GetBlockchainID() Blockchain {
    return Ethereum
}

// Convert only when needed
func ToProtoBlockchain(b Blockchain) common.Blockchain {
    return common.Blockchain(b)
}

Think of proto files like blueprints for service contracts—not laws governing your entire codebase.


Observability

Understand the Difference between Metrics and Logs

Logs are expensive. Absent complicated log parsing mechanisms, they’re a stream of noise nobody reads.

Metrics, being essentially numbers, are easy for machines to parse, store, and understand. We prefer to expose critical information via metrics since it’s easy to:

Logging “all the things, all the time” is not sustainable. Metrics are much cheaper and easier to handle.

That’s not to say that logs are unimportant. Metrics in systems like Prometheus scale with the cardinality of labels. Cardinality is a huge concern with metrics systems.

Logs on the other hand, allow near infinite cardinality. Some data, like say requests are better captured by logs.

Any real system will incorporate both.

Real-World Example: WAF Security Logs

Web Application Firewall (WAF) logs demonstrate the complementary nature of metrics and logs.

The Problem: WAF systems like ModSecurity generate structured JSON logs for every blocked request. Each log contains:

Storing this in Prometheus would be impossible - the cardinality would explode the metrics database.

The Solution: Hybrid Approach

Metrics for Aggregates:

# Track overall block rate
rate(modsec_requests_blocked_total[5m])

# Alert on anomalies
modsec_requests_blocked_total > 100/min

# Track by attack type (low cardinality)
modsec_blocks_by_type{type="SQLi"}
modsec_blocks_by_type{type="XSS"}

Logs for Details:

Example Elasticsearch Ingest Pipeline:

{
  "processors": [
    {
      "geoip": {
        "field": "client",
        "target_field": "geoip"
      }
    },
    {
      "script": {
        "source": "
          if (ctx.transaction?.messages != null) {
            def ruleId = ctx.transaction.messages[0]?.details?.ruleId;
            if (ruleId?.startsWith('930')) ctx.attack_type = 'Path Traversal/LFI';
            else if (ruleId?.startsWith('941')) ctx.attack_type = 'XSS';
            else if (ruleId?.startsWith('942')) ctx.attack_type = 'SQLi';
          }
        "
      }
    }
  ]
}

Query Pattern:

# Metrics: "Are we under attack right now?"
curl prometheus:9090/api/v1/query?query=rate(modsec_blocks[1m])

# Logs: "Who attacked us and what did they try?"
curl elasticsearch:9200/modsec-*/_search -d '{
  "query": {"range": {"@timestamp": {"gte": "now-1h"}}},
  "aggs": {
    "by_country": {"terms": {"field": "geoip.country_name"}},
    "by_attack": {"terms": {"field": "attack_type"}}
  }
}'

Investigation Automation:

The diagnostic-slackbot project demonstrates automated WAF analysis:

Key Insight: Use metrics for real-time alerting and dashboards. Use logs for forensic analysis and investigation. Neither is sufficient alone.

Prefer Continuous Systems to Jobs

When possible, prefer services that run continuously over Jobs and CronJobs.

If you need a periodic task, code the period into the service itself with internal timers. This provides easier monitoring via Prometheus rather than using a PushGateway.

Prometheus supports the PushGateway, but the support is limited. Prometheus was designed to periodically scrape a long running service. Pushing metrics is opposite to the basic Prometheus design philosophy, and therefore it’s not always as reliable as scraping. The last thing we want is to lose information.

Services with internal cron-like tasks must:

List All Metrics in the README

The README alone should be sufficient to set up dashboards and alerts. List all metrics and labels.


Security & Compliance

Non-Negotiables

Investigation Methodology

Critical: Don’t Assume, Investigate

Systematic Debugging Process:

  1. Start with the obvious: recent deployments, known incidents
  2. Check metrics for anomalies
  3. Correlate with logs
  4. Cross-reference with system events
  5. Be systematic—don’t jump to conclusions
  6. Document your investigation path

For security tool blocks (WAF, etc.):


Documentation

Documentation Must

  1. Reside in the repo for the code it describes
  2. Be written in Markdown
  3. Be accessible via link from Notion
  4. Be stored in top-level /docs directory
  5. Be renderable in any format from Markdown source
  6. Be rendered in PDF format
  7. Be named clearly in CamelCase (e.g., ServiceArchitecture.pdf)

Documentation Must Not

  1. Be located in multiple places
  2. Be stored in dead-end formats where they can’t be easily extracted or re-rendered

Rendering

Use pandoc to render Markdown to nearly any format:

pandoc -f markdown -t pdf -o NameOfDoc.pdf <source file>

Warning: LaTeX project is huge. Be prepared for a large download.

Documentation examples should follow this pattern, with PDFs generated from Markdown source.


Communication Style


What NOT to Do


CI/CD & GitHub Actions

Continuous Integration and Continuous Deployment are fundamental to modern software delivery. Every repository should have automated testing, linting, and release management.

Reference Implementation

For a complete, production-ready GitHub Actions workflow, see:

GitHub Actions Reference Implementation

This reference provides:

Key Requirements

Every Go project should have a .github/workflows/ci.yml that:

  1. Runs tests - make test must pass
  2. Runs linting - Both golangci-lint AND namedreturns
  3. Caches dependencies - Go modules and Docker layers
  4. Enforces branch protection - Tests must pass before merge
  5. Automates releases - Tag and publish on main branch pushes

Essential Makefile Targets

.PHONY: test lint

test:
	go test -v ./... -count=1 --cover

lint:
	@echo "Running namedreturns linter..."
	namedreturns ./...
	@echo "Running golangci-lint..."
	golangci-lint run

Why This Matters

See the complete reference for detailed configuration, customization options, and troubleshooting.


Remember

Every decision has security, compliance, and reliability implications. When uncertain about whether something meets standards, err on the side of caution and ask.

Excellence is not optional. It’s a practice, a discipline, and ultimately, a habit. Make it yours.


“I will find a way, or I will make one.” - Hannibal Barca