Hacker Newsnew | past | comments | ask | show | jobs | submit | denidoman's commentslogin

We are still far from it. Same issue as with robots: you have to build a new environment to let them work efficiently, or at least severely adjust the existing one.

This will be the next step, and it will be nasty: changing our workflows to "agent-friendly" even if they less convenient for humans. And then - yes, partial replacement.


The current challenge is not to create a patch, but to verify it.

Testing a fix in a big application is a very complex task. First of all, you have to reproduce the issue, to verify steps (or create them, because many issues don't contain clear description). Then you should switch to the fixed version and make sure that the issue doesn't exists. Finally, you should apply little exploratory testing to make sure that the fix doesn't corrupted neighbour logic (deep application knowledge required to perform it).

To perform these steps you have to deploy staging with the original/fixed versions or run everything locally and do pre-setup (create users, entities, etc. to achieve the corrupted state).

This is very challenging area for the current agents. Now they just can't do these steps - their mental models just not ready for a such level of integration into the app and infra. And creation of 3/5/10/100 unverified pull requests just slow down software development process.


All the things you describe are already being done by any team with a modern CI/CD workflow, and none of it requires AI.

At my last job, all of those steps were automated and required exactly zero human input.


Are you sure about "all"? Because I mentioned not only env deployment, but also functional issue reproduction using UI/API, which is also require necessary pre-setup.

Automated tests partially solve the case, but in real world no one writes tests blindly. It's always manual work, and when the failing trajectory is clear - the test is written.

Theoretically agent can interact with UI or API. But it requires deep project understanding, gathered from code, documentation, git history, tickets, slack. And obtaining this context, building an easily accessible knowledge base and puring only necessary parts into the agent context - is still a not solved task.


If your CI/CD process was able to fully verify a fix then it would have stopped the bug from making it to production the first time around and the Jira ticket which was handed to multiple LLMs never would have existed.


There is no fundamental blocker to agents doing all those things. Mostly a matter of constructing the right tools and grounding, which can be fair amount of up-front work. Arming LLMs with the right tools and documentation got us this far. There’s no reason to believe that path is exhausted.


Look at this 18 years old Django ticket: https://code.djangoproject.com/ticket/4140

It was impossible to fix, but it required some experiments and deep research about very specific behaviors.

Or this ticket: https://code.djangoproject.com/ticket/35289

Author proposed one-line solution, but the following discussion includes analysis of RFC, potential negative outcomes, different ways to fix it.

And without deep understanding of the project - it's not clear how to fix it properly, without damage to backward compatibility and neighbor functionality.

Also such a fix must be properly tested manually, because even well designed autotests are not 100% match the actual flow.

You can explore other open and closed issues and corresponding discussions. And this is the complexity level of real software, not pet projects or simple apps.

I guess that existing attention mechanism is the fundamental blocker, because it barely able to process all the context required for a fix.

And feature requests a much, much more complex.


Have you tried building agents? They will go from PhD level smart to making mistakes a middle schooler would find obvious, even on models like gemini-2.5 and o1-pro. It's almost like building a sandcastle where once you get a prompt working you become afraid to make any changes because something else will break.


> Have you tried building agents?

I think the issue right now is so many people want to believe in the moonshot and are investing heavily in it, when the reality is we should be focusing on the home runs. LLMs are a game changer, but there is still A LOT of tooling that can be created to make it easier to integrate humans in the loop.


you can just even tell cursor to use any cli tools you use normally in your development, like git, gh, railway, vercel, node debugging, etc etc


Tools is not the problem. Knowledge is.


> Tools is[sic] not the problem. Knowledge is.

This is the most difficult concept to convey, expressed in a succinct manner rarely found.


The fundamental blocker: Context


Correct! Over at https://ghuntley.com/mcp I propose that each company develops their own tools for their particular codebase that shapes LLM actions on how to work with their codebase.


btw, JetBrains IDEs have built-in HTTP-client. Not so GUI rich, but very convenient for development and no need for a separate if you're already using the IDE https://www.jetbrains.com/help/idea/http-client-in-product-c...


A big plus imo is that you can check in and version control the files alongside your application. Very convenient when working on micro services in a team.


next step: websites add irrelevant text and prompt injections into hidden dom nodes, tags attributes, etc. to prevent llm-based scraping.


It sounds interesting, could you please share the links if it is open sourced?


Nice app, and thank you for OSS!

Am I right, that the following 2 services solve the same product in a similar way: https://github.com/logspace-ai/langflow , https://github.com/FlowiseAI/Flowise ?

It's absolutely ok if the answer is "Yes", I think that in this hot market each product will find a place. And competition is also motivate :)

It would be also nice to add Rivet here: https://github.com/kyrolabs/awesome-langchain#low-code


Thanks!

Yes, I think Rivet, langflow, and flowise all came to the idea of visual programming for LLMs in parallel. I like to think it's a good sign that visual programming is a powerful paradigm here :)

I do think Rivet hits a pretty different use-case. Beyond being in the TypeScript ecosystem, Rivet's remote debugging and embedability are pretty unique, and super critical!


Nice app but I miss realtime rain info. I prefer Yandex Weather: https://yandex.com/weather/maps/nowcast

It has a layer selector with temperature, wind, pressure and precipitation.


Well, it has Radar (for Europe, US and some parts of the rest) & Satellite layer (whole world), which shows you current conditions. It can also show meteo-stations with current values of temperature, pressure and so on.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: