We are still far from it. Same issue as with robots: you have to build a new environment to let them work efficiently, or at least severely adjust the existing one.
This will be the next step, and it will be nasty: changing our workflows to "agent-friendly" even if they less convenient for humans. And then - yes, partial replacement.
The current challenge is not to create a patch, but to verify it.
Testing a fix in a big application is a very complex task. First of all, you have to reproduce the issue, to verify steps (or create them, because many issues don't contain clear description). Then you should switch to the fixed version and make sure that the issue doesn't exists. Finally, you should apply little exploratory testing to make sure that the fix doesn't corrupted neighbour logic (deep application knowledge required to perform it).
To perform these steps you have to deploy staging with the original/fixed versions or run everything locally and do pre-setup (create users, entities, etc. to achieve the corrupted state).
This is very challenging area for the current agents. Now they just can't do these steps - their mental models just not ready for a such level of integration into the app and infra. And creation of 3/5/10/100 unverified pull requests just slow down software development process.
Are you sure about "all"? Because I mentioned not only env deployment, but also functional issue reproduction using UI/API, which is also require necessary pre-setup.
Automated tests partially solve the case, but in real world no one writes tests blindly. It's always manual work, and when the failing trajectory is clear - the test is written.
Theoretically agent can interact with UI or API. But it requires deep project understanding, gathered from code, documentation, git history, tickets, slack. And obtaining this context, building an easily accessible knowledge base and puring only necessary parts into the agent context - is still a not solved task.
If your CI/CD process was able to fully verify a fix then it would have stopped the bug from making it to production the first time around and the Jira ticket which was handed to multiple LLMs never would have existed.
There is no fundamental blocker to agents doing all those things. Mostly a matter of constructing the right tools and grounding, which can be fair amount of up-front work. Arming LLMs with the right tools and documentation got us this far. There’s no reason to believe that path is exhausted.
Author proposed one-line solution, but the following discussion includes analysis of RFC, potential negative outcomes, different ways to fix it.
And without deep understanding of the project - it's not clear how to fix it properly, without damage to backward compatibility and neighbor functionality.
Also such a fix must be properly tested manually, because even well designed autotests are not 100% match the actual flow.
You can explore other open and closed issues and corresponding discussions. And this is the complexity level of real software, not pet projects or simple apps.
I guess that existing attention mechanism is the fundamental blocker, because it barely able to process all the context required for a fix.
Have you tried building agents? They will go from PhD level smart to making mistakes a middle schooler would find obvious, even on models like gemini-2.5 and o1-pro. It's almost like building a sandcastle where once you get a prompt working you become afraid to make any changes because something else will break.
I think the issue right now is so many people want to believe in the moonshot and are investing heavily in it, when the reality is we should be focusing on the home runs. LLMs are a game changer, but there is still A LOT of tooling that can be created to make it easier to integrate humans in the loop.
Correct! Over at https://ghuntley.com/mcp I propose that each company develops their own tools for their particular codebase that shapes LLM actions on how to work with their codebase.
A big plus imo is that you can check in and version control the files alongside your application. Very convenient when working on micro services in a team.
Yes, I think Rivet, langflow, and flowise all came to the idea of visual programming for LLMs in parallel. I like to think it's a good sign that visual programming is a powerful paradigm here :)
I do think Rivet hits a pretty different use-case. Beyond being in the TypeScript ecosystem, Rivet's remote debugging and embedability are pretty unique, and super critical!
Well, it has Radar (for Europe, US and some parts of the rest) & Satellite layer (whole world), which shows you current conditions. It can also show meteo-stations with current values of temperature, pressure and so on.
This will be the next step, and it will be nasty: changing our workflows to "agent-friendly" even if they less convenient for humans. And then - yes, partial replacement.