MAIN SITEBRUTALIST SITE

Claude Commander

// AUTONOMOUS AI ORCHESTRATION SYSTEM

An autonomous AI that ships real work—without an agent framework.

// THE PROBLEM

Most “AI agents” fail for a boring reason: they don’t have a harness.

They have vibes, not verification. They have loops, not guardrails. They have “DONE” messages... and broken builds.

// THE THESIS

Plan

Strategy first

Verify

Reality checks

Iterate

Until green

...and constrain AI autonomy with reality checks, not prompts asking “are you sure you’re done?”

// WHY RUST

The “agent” part isn’t the hard part—the hard part is process orchestration.

Commander constantly spawns real commands (Claude Code runs, tests, linters), streams logs, enforces timeouts, tracks exit codes, and decides what to do next.

Rust gives a small, fast binary with deterministic state, reliable process control, and structured logging, so the harness stays boring—and that’s the point.

// Key insight:
The model is allowed to be probabilistic.
The runner isn’t.

// THE PIPELINE

Phase 1: Plan (Best-of-3)

Planning is where you pay down your biggest risk: picking the wrong approach.

  • Concrete file touchpoints (where changes will happen)
  • Runnable acceptance checks (how “done” will be verified)
  • Explicit risks + rollback notes

No plan, no build.

Phase 2: Build (Iterative)

Build is an iterative loop, not a single “generate code” moment.

This is where most “agents” quietly cheat: they let the model decide when it’s done. Commander doesn’t.

Phase 3: Verify (Reality is the referee)

Verify runs the gates. This is the whole point of the harness:

  • Tests pass
  • Lint/typecheck pass
  • Acceptance checks from the plan are satisfied
The model does not decide “done.” Your repo does.

// THE ORCHESTRATOR LLM

The most annoying human-in-the-loop work isn’t writing code. It’s answering phase questions:

Claude Commander removes that by adding a dedicated Orchestrator LLM—not a coder, not a planner, but a traffic cop.

This turns autonomy into something closer to a state machine than a vibe loop.

// SECURITY

Claude Code is governed by Anthropic’s safety research—Constitutional AI, RLHF, and ongoing alignment work that shapes how Claude reasons about harm, permissions, and boundaries.

Claude Commander adds another layer: user verification for system commands. Before any shell command executes, the harness can require explicit approval—no silent rm -rf surprises.

The Orchestrator LLM also runs expectation checks, flagging suspicious patterns: unexpected file access, network calls outside scope, or commands that don’t match the plan. Autonomy doesn’t mean unsupervised.

Autonomy + guardrails = shipping safely.

The Key Insight

Autonomous systems ship when they’re constrained by reality.

Claude Commander is a harness that forces strategy first, uses Best-of-N where it matters, iterates with real gates, and uses an Orchestrator to eliminate phase babysitting.

Start with your gates, not your prompts

Write down your verification harness first. What commands prove the change is real? What’s forbidden? What does “done” mean in the repo, not in the model’s head?