My post on the spectrum of coding agents is not even cold yet, and already two new names are trending: OpenAI's Codex in the cloud and Google's Jules.
On paper they feel familiar. Picture the terminal-based coding agents but with a web UI glued on top and the development environment moved to the cloud.
To evaluate them fairly, I presented both with the same task: migrating SQLAlchemy to SQLModel.
My verdict: it does not work.
Both claimed success, yet neither achieved working static typing (pyright
) or passing tests.
Jules did not even attempt to run any linting or tests.
Part of the problem lies in their approach to development environments.
Rather than using a Dockerfile as the environment definition,
they rely on prompt-based instructions (see AGENTS.md
).
I could envision this working for simple tasks, similar to Sweep's original vision —
I recently discovered they have pivoted to developing coding assistants for JetBrains.
Their initial concept was to create a junior developer that handles issues and creates pull requests through GitHub integration.
Skipping the step of defining a development environment does not work for complex tasks yet.
One interesting observation: while Jules demonstrated better understanding of SQLModel and thus performed the conversion more effectively, it chose to refactor my Python code into something Java-like. This was neither requested nor Pythonic. Codex, on the other hand, was hallucinating the SQLModel API, but at least maintained the original coding style.
I am confident both will improve over time, but I do not yet see them as a threat to existing coding agents. They represent a new (browser-based) interface for coding agents, rather than a new class of agents. The UI/UX benefits simple issues, but requires direct integration with CI to be truly valuable. Once they natively integrate with GitHub or your preferred distributed VCS, I would certainly consider using this coding agent interface for simple issues (initially).
Less than a day has passed, and GitHub has announced its own cloud-based coding agent: GitHub Copilot Coding Agent. In this post, I argued that a cloud-based coding agent should be more closely integrated with the version control system. I got what I asked for!