My dev flow

Published on May 8, 2026

A snapshot of my development flow, so I can look back at it in three months and reminisce about the good old times.

Goal

Get as much work done as possible. Two axes to scale on: have agents do more valuable work on their own (scale up), and have more agents working in parallel (scale out).

Scale up

Scale up is about giving a single agent enough context and autonomy to run for a long time and do the right thing on its own.

I do not do much ralphing (a bash loop that re-runs the same prompt until the agent eventually gets it right) or inter-agent debating (have Claude and Codex hash a question out iteratively until they converge) anymore. Both still have their place, but the funnel below has become my default.

Stage one is discovery. I lean on skills here — recipes for turning production data (logs, traces, events, and so on) into design decisions before any code changes. Code search is part of the same step; before I plan anything, I want to know what is actually there. Most of this looks like asking the agent a lot of questions.

Stage two is planning. I have the agent draft a plan in Claude Code's plan mode; when it finishes, Claude Code writes the plan to a markdown file and prints the path. I open the file in nvim and leave inline annotations next to the parts I want changed. The agent re-reads the file with my comments and updates the plan; the cycle repeats until I am happy.

Stage three is execution. The agent implements the plan, opens the PR as a draft, and polls CI itself. Skills handle evals and local-environment testing — the agent runs them, reads the output, and iterates without me babysitting.

Stage four is review. A single /pr-review skill kicks off two flavors of Claude review, and a Codex review, in parallel. On the remote side, Cursor's Bugbot adds another independent pass. Only after all of those are happy (and CI is green) does the PR move from draft to ready for human eyes.

Across all four stages, a well-tended AGENTS.md carries the project-specific knowledge that lets the agent skip ramp-up. I lean on subagents heavily, especially for investigations that would otherwise blow up the main context window.

Scale out

Scale out is about running the above on as many features or issues in parallel as I can keep in my head. Today that's around six tmux windows, one per parallel task, with multiple panes per window — averaging one and a half agents per task on top of supporting panes. Day-to-day, the game is whack-a-mole: keep each agent fed with its next input so none of them sit idle. Underneath that, there's a slower game of raising the brain-fry ceiling — practice and tooling both lift it.

First precondition: removing visual noise. The only two apps I keep open are the browser and the terminal. Slack, email, and calendars live in the browser; manual code inspection, note-taking in nvim, and the agents themselves all live in the terminal. Anything outside those two apps would be one more thing my biological RAM has to keep alive.

Inside the terminal, the base unit is a git worktree per feature, hosted in its own tmux window. Inside that window, several panes do the work: the agents (Claude Code and Codex side by side), the dev server (on its own port so parallel worktrees don't collide), and one or more logpanes for evals, reviewers, or other long-running things. On the browser side, I rename the corresponding tab in Zen to match, so I can keep track of which worktree I am poking at. A separate tmux window stays open for reviewing other people's PRs, so I can drop into it without losing my place.

The glue is bash functions in my dotfiles. wtnew creates a worktree at ../wt-<name> with a fresh branch from origin/main, configures the environment, and renames the tmux window. wtrm tears one down. gcob creates a fresh branch off origin/main with --no-track. logpane runs a command, redirects its output to a logfile that a small tmux split tails live, and rings the terminal bell when the command finishes. That bell is also how Claude Code talks back: a small script on the Stop and Notification hooks fires a macOS notification with a one-line summary (the last user prompt, truncated) and rings the same bell. tmux picks it up and turns the window's entry in the status line red, so I can tell at a glance which agent needs me back without having to look pane by pane. Stitched together, this reproduces a better Conductor for me — tailored to my specifics and endlessly customizable.

Not every window needs my attention all the time, and two patterns handle the rest. The parking lot is a tmux session called parking; when I am temporarily paused on a window — most often awaiting PR feedback — I move it there from the default session, where it sits out of sight until I need it back. Shortcuts handle the move in both directions. For waits that aren't running under logpane, I have unread N: it sleeps N seconds in a pane, then rings the bell. When I am waiting on something slow, I run unread 1800 and forget about it; the bell pulls me back when the wait is up.

Meta

A few habits matter more than any specific tool.

The first is keeping .agents/ under continuous update. Skills, agents, recipes, and project conventions get added the moment I find a pattern worth re-using, so the next agent run starts smarter, more contextually aware, and more independent than the last. This is what compounds on scale up.

The second is keeping dotfiles under continuous update. Whenever I find myself doing the same thing twice in the shell or in tmux, that is a signal to lift it into the dotfile repo so the next worktree is set up faster. This is what compounds on scale out.

Remote

Tailscale handles the connectivity part. I SSH back into my laptop from anywhere and attach to the running tmux session, which gives me the full agent fleet as if I never left. In practice I rarely do this; the setup is mostly there for the day I need it.

Beyond Tailscale, there is now Claude's remote control. It drives a single Claude session rather than the whole tmux setup the way that's possible with Tailscale, so it is less capable for managing a fleet — but the UX is nicer, especially on mobile.