Yeah this happens to me all the time! I have a separate session for discussing and only apply edits in worktrees / subagents to clearly separate discuss from work and it still does it
Sure, humans make mistakes... but rarely, vanishingly rarely about commands they use often. Are you going to make a non-typo kind of mistake when typing `ls -l`? AI hallucinations don't happen all the time, but they happen so much more often than "vanishingly rarely".
That's why you can't just vibe-code something and expect it to work 100% correctly with no design flaws, you need to check the AI's output and correct its mistakes. Just yesterday I corrected a Claude-generated PR that my colleague had started, but hadn't had time to finish checking before he went on vacation. He'd caught most of its mistakes, but there was one unit test that showed that Claude had completely misunderstood how a couple of our services are intended to work together. The kind of mistake a human would never have made: a novice wouldn't have understood those services enough to use them in the first place, and an expert would have understood them and how they are supposed to work together.
You always, always, have to double-check the output of LLMs. Their error rate is quite low, thankfully, but on work of any significant size their error rate is pretty much never zero. So if you don't double-check them then you're likely to end up introducing more bugs than you're fixing in any given week, leading to a codebase whose quality is slowly getting worse.
Every day users? Probably not many. It forcibly disables lots of nice-to-have features.
But users who need a highly secure phone? It’s entirely possible to use the phone without media embeds in iMessage, or shared photo albums, or websites loading in 900 fonts. It’s a trade off likely worth making in some situations.
You can make a shared photo album with family members. It’s everyone else that is problematic with the feature enabled. In my case I only want to share with my wife and son so it wasn’t a detractor for me.
I’ve used it on my personal iPhone since the feature was released. The impact to my life has been minor. I can’t share some thing with my wife in the health app and my son can’t SharePlay with me in the car while I use CarPlay.
I use Preact without reactivity. That way we can have familiar components that look like React (including strong typing, Typescript / TSX), server-side rendering and still have explicit render calls using an MVC pattern.
View triggers an event -> Controller receives event, updating the model as it sees fit -> Controller calls render to update views
Model knows nothing about controller or views, so they're independently testable. Models and views are composed of a tree of entities (model) and components (views). Controller is the glue. Also, API calls are done by the controller.
So it is more of an Entity-Boundary-Control pattern.
From what I can tell, they do full page reloads when visiting a different page, and use Preact for building UIs using components. Those components and pages then get rendered on the server as typical template engines.
Our approach is actually very cost-effective compared to alternatives. Our browser uses a token-efficient LLM-friendly representation of the webpage that keeps context size low, while also allowing small and efficient models to handle the low-level navigation. This means agents like Claude can work at a higher abstraction level rather than burning tokens on every click and scroll, which would be far more expensive
are your evals / comparisons publicly/3rd party reproducible?
If it's "trust me, I did a fair comparison", that's not going to fly today. There's too much lying in society, trusting people trying to sell you something to be telling the truth is not the default anymore, skepticism is
I'm paying a fixed amount on Claude and other agents, so "more tokens" is "free" for me. There's a lot of niche tools out there but I think we all have "subscription fatigue".
But maybe that's just me - Maybe im just not your target audience :)
reply