Building Your Own Agent Harness

I’ve been trying to write about coding agents for a while. Each time I sit down, the ground has shifted. The models change, my own workflow changes, and whatever I had to say feels stale before I finish saying it.

But one thing has stayed constant through all of it: the agents that produce good work are the ones that know how I work. My conventions, my patterns, my preferences. The ones that fail are the ones I drop into a codebase with no context and expect to read my mind. Complaining that an agent wrote bad code without giving it any of that context is like calling a new hire incompetent on their first day because they didn’t already know your codebase.

The models have also improved to a point where which model you use matters less than how you use it. The gap between Claude, Gemini, and GPT shrinks with every release. LangChain proved this on a benchmark: they improved their coding agent from 52.8% to 66.5% on Terminal Bench 2.0 by only changing the harness, keeping the model fixed. Same model, better harness, better results.

Mitchell Hashimoto gave this practice a name in February 2026: harness engineering. OpenAI and Martin Fowler picked it up.

Hashimoto’s definition is reactive: anytime an agent makes a mistake, engineer a fix so it doesn’t happen again. I think of it more broadly. A harness is a set of skills, workflows, and methodology that teaches your agent how you think and how you build. The models are unpredictable by nature. You can’t change that. But you can teach them a methodology that makes success the more likely outcome rather than the lucky one.

Atelier is my harness. This post explains why I built it and how, but the idea is bigger than one project.

Everyone landed here independently

Boris Tane wrote a blog post that nails the core principle:

never let Claude write code until you’ve reviewed and approved a written plan.

He describes separating planning from execution as the single most important thing he does. The rest of his workflow follows from that:

  • research the codebase first
  • write a plan in a markdown file
  • annotate the plan back and forth until it’s right
  • then implement.

His annotation cycle is the part worth paying attention to. He adds inline notes directly into the plan document, sends the agent back to update it, and repeats until you and the agent are aligned before a single line of code gets written. The plan becomes shared mutable state between you and the agent.

The same pattern showed up independently across the community. HumanLayer called it “RPI” (Research -> Plan -> Implement). Jesse Vincent’s Superpowers landed on the same shape, now with 40k+ stars. Atelier is my version of the same idea. The convergence says more about the pattern than about any one implementation.

The harness doesn’t need to be complicated. It needs to exist.

What a harness is made of

In Atelier, the research -> plan -> implement loop runs through four skills:

spec:research explores the codebase, reads existing patterns, surfaces relevant context, and produces a spec.md, the research artifact that everything downstream depends on.

spec:plan takes that research and breaks it into tasks, producing a plan.json. This is the most important step. A good plan means the agent can commit to a specific approach and execute without constant course-correction. A bad plan means you spend tokens going back and forth mid-implementation, which is worse than spending that time upfront.

spec:implement executes the plan with TDD. Each task gets built, tested, verified.

spec:finish validates the whole thing and runs a review pass.

There’s also spec:orchestrator handling skill routing, deciding which skill to load based on what you’re doing.

graph LR
    R[spec:research] --> P[spec:plan]
    R -- "review & annotate" --> H[human]
    H -- "update spec" --> R
    P --> I[spec:implement]
    I --> F[spec:finish]
    I -- "gaps found" --> P

    style R stroke:#4a9eff,stroke-width:2px
    style P stroke:#f59e0b,stroke-width:2px
    style I stroke:#10b981,stroke-width:2px
    style F stroke:#8b5cf6,stroke-width:2px
    style H stroke:#ef4444,stroke-width:2px

This isn’t waterfall. Backflow is expected. You review the research, annotate it, and loop with the agent until the spec is right. Implementation can still find gaps in the plan and push back. When that happens you go back, update, and continue. That’s what separates it from rigid spec-driven development that breaks the moment reality shows up.

Different kinds of harness components

Atelier has 34 skills organized into three namespaces, and the distinction between them matters because different kinds of knowledge should behave differently in an agentic context.

spec: skills are sequential. They produce artifacts (documents, plans, code) and they’re meant to be followed closely. You invoke them explicitly or a previous skill triggers them.

oracle: skills are advisory. oracle:architect applies DDD patterns and thinks about component responsibilities. oracle:challenge pushes back on your approach and pokes holes in your design. These adapt to context rather than following a rigid procedure.

code: skills are utilities: review, commit, the kind of thing you reach for when you need it.

A spec workflow needs structure and artifacts, a thinking tool should adapt, and a commit helper just needs to work when called. Collapsing everything into one format loses that distinction.

Skills auto-load when relevant. Say “create a spec for user auth” and the agent matches that to spec:research. Language-specific skills (Drizzle, Fastify, FastAPI, SQLAlchemy, and others) activate based on what you’re working in.

How mine evolved

Atelier didn’t start here. The first version, back in August 2025, was rigid. Subagents chained in sequence, waterfall-style. Context-reader feeds requirements-gatherer feeds spec-writer. It worked in a narrow sense but couldn’t handle the back-and-forth that real development requires.

By January 2026 I’d moved to a living spec.md model with delta tracking for brownfield changes (ADDED, MODIFIED, REMOVED markers). Closer, but still too ceremonial.

The current version dropped the ceremony, and agent skills are the reason. Before skills, you had agents, commands, and reference docs, and you tried to chain them together yourself. You had to remember which command to run, when to invoke which agent, which doc to point it at. Skills changed that. They let you capture knowledge in a form the agent can find on its own. You describe what you want to do, and the agent loads the relevant skill by context. The knowledge is there when it’s needed without you having to be the orchestrator.

Skills load by context. The spec workflow handles the structured parts, and the oracle skills handle the rest. The name stuck through all of it. An atelier is a workshop where a master and assistants work together. That’s the literal intent. The agent is the assistant, the codebase is the workshop, and the skills encode how I like things done.

Each iteration was me learning what the harness actually needed. The rigid version taught me that backflow matters. The ceremonial one, that process should be invisible until you need it. The current version reflects both.

Build your own harness

Install Atelier if you want: npx skills add martinffx/atelier. Or grab specific skills: npx skills add martinffx/atelier --skill spec:research.

But the real suggestion is to build your own. My TypeScript opinions aren’t yours. My architecture preferences won’t match your codebase. The value of Atelier isn’t in my specific skills. It’s in the idea that your agent’s harness should be engineered with the same care as the software it produces.

Start with the research -> plan -> implement loop. Get that working in your own way. Add skills for how you write code, how you test, how you think about design. Boris Tane’s annotation cycle is meticulous, Superpowers is a full framework, and Atelier sits somewhere in between. They all converge on the same observation: the discipline matters more than the tools. The harness just encodes that discipline so you don’t have to remember it each time.

Fork Atelier, take what works, throw away what doesn’t.

Further reading