Jason Liu’s Codex-maxxing gave me a cleaner way to think about why some agent workflows feel magical and others feel like babysitting.
The difference is not just model quality. It is whether the work has an operating loop.
A useful agent workflow needs a few things around the model:
- a durable thread where the work can continue
- shared memory outside the chat, preferably in files you can inspect and diff
- a way to steer the agent while it is already working
- tools that can touch the real surfaces where work happens
- artifacts you can review directly, not just descriptions of artifacts
- a verification loop that tells the agent when the task is actually done
That last point matters. “Implement this plan” is weak because the finish line is vague. “Port this library and keep going until the original test suite passes” is stronger because the agent has an oracle.
The framing I liked most: agents become more useful when work has somewhere to live. A repo can hold code, but long-running knowledge work needs its own memory layer too: notes, decisions, open loops, people, context, and project state. Otherwise every session starts by reconstructing the same background again.
For my own workflow, the takeaway is simple: stop treating agents as prompt boxes. Treat them more like long-running workers with a notebook, a queue, a workspace, and a test plan.
Better prompts help. Better loops compound.