It also means that you need to extract enough value to cover the cost of said tokens, or reduce the economic benefit of finding exploits.
Reducing economic benefit largely comes down to reducing distribution (breadth) and reducing system privilege (depth).
One way to reduce distribution is to, raise the price.
Another is to make a worse product.
Naturally, less valuable software is not a desirable outcome. So either you reduce the cost of keeping open (by making closed), or increase the price to cover the cost of keeping open (which, again, also decreases distribution).
The economics of software are going to massively reconfigure in the coming years, open source most of all.
I suspect we'll see more 'open spec' software, with actual source generated on-demand (or near to it) by models. Then all the security and governance will happen at the model layer.
> I suspect we'll see more 'open spec' software, with actual source generated on-demand (or near to it) by models. Then all the security and governance will happen at the model layer.
So each time you roll the dice you gamble on getting a fresh set of 0-days? I don't get why anyone would want this.
You already do this with human-authored code, just slowly.
Project model capabilities out a few years. Even if you only assume linear improvement at some point your risk-adjusted outcome lines cross each other and this becomes the preferred way of authoring code - code nobody but you ever sees.
Most enterprises already HATE adopting open source. They only do it because the economic benefit of free reuse has traditionally outweighed the risks.
If you need a parallel: we already do this today for JIT compilers. Everything is just getting pushed down a layer.
Next, you double click the Excel icon on the desktop, and instead of having Excel installed or a spec of Excel, you have a cloud service with thirty years of Usenet, Quora, StackOverflow, Reddit, PHPBB comments and blog tutorials about how people use Excel, and you wait a few moments while approximately-Excel is rederived from these experiences.
You’ll accept the delay because by then it happens faster than Microsoft can make a splashscreen and window open from a local nvme drive. And because you can customise Excel’s feature set by simply posting a Reddit comment where you hallucinate using a feature that Excel doesn’t have and waiting a couple of days.
[although it can be difficult to find the real Reddit to post on as your web browser will tend to synthesise the experience of visiting any website using a cloud AI model of every website without connecting to the real one at all. This was widely loved as a security measure and since most websites are AI written content on AI written codebases, makes less difference than you’d first think]
It's a mistake to confuse what you're seeing out of today's models with what you'll see out of future ones. We're barely out of the gate on this stuff. We'll borrow what works, and use it to bootstrap something better.
> You already do this with human-authored code, just slowly.
No I don't. I build predictable and deterministic pipelines. If I rebuild from a specific git sha, I expect the same output. If I get something different, I need to fix what's causing that.
Nothing precludes you from doing that with AI-gen code vs human-gen code. What you just described is downstream.
If you have a human authoring code, you re-roll every time they release a new version. AI just releases versions faster, and in response to different, faster-moving inputs.
Once upon a time S3 used to cache small objects in their keymap layer, which IIRC had a similar threshold. I assume whatever new caching layer they added is piggybacking that.
This keeps the new caching layer simple and take advantage of the existing caching. If they went any bigger they'd likely need to rearchitect parts of the keymap or underlying storage layer to accommodate, or else face unpredictable TCO.
1) It chews through tokens. If you're on a metered API plan I would avoid it. I've spent $300+ on this just in the last 2 days, doing what I perceived to be fairly basic tasks.
2) It's terrifying. No directory sandboxing, etc. On one hand, it's cool that this thing can modify anything on my machine that I can. On the other, it's terrifying that it can modify anything on my machine that I can.
That said, some really nice things that make this "click":
1) Dynamic skill creation is awesome.
2) Having the ability to schedule recurring and one-time tasks makes it terribly convenient.
3) Persistent agents with remote messaging makes it really feel like an assistant.
> It chews through tokens. If you're on a metered API plan I would avoid it. I've spent $300+ on this just in the last 2 days, doing what I perceived to be fairly basic tasks.
Didn’t Anthropic make it so you can’t use your Claude Code Pro/Max with other tools? Has anyone experienced a block because of that policy while using this tool?
Also really curious what kind of tasks ran up $300 in 2 days? Definitely believe it’s possible. Just curious.
Seen a couple of people on X have posted about their Claude accounts being suspended after using this. All of them seem to have used it with Claude Code so yes looks like it violates their policy (not surprising really, it breaks their TOS).
I've tried it on Codex (ChatGPT Pro) and within an hour of just getting stuff set up and tested used half my weekly limit so I can see using $300 in a couple of days being very easy.
Until thats figured out this is basically a non starter, you can't use it if its going to cost $1k+ per week to use, and I'm not sure theres any local models that'd handle it without $10k+ in hardware costs.
I’ve been working on adapting Claude Code to do some repetitive “personal assistant” type tasks so I was really excited to try this tool.
One of my tasks is a skill that fetches my calendar via MCP and slots events into a JSON to be used for an OR-Tools constraint optimizer that finds a workable schedule for something. It then uploads those events to the calendar using MCP when I choose my favorite candidate solution.
I checked token usage for this task last time I ran it. It would’ve cost $29 in API usage with Opus 4.5.
So yea, you’re absolutely right that this stuff isn’t going to go mainstream at these rates.
One thing you can try is powering Clawdbot with a local model. My company recently wrote[0] about it.
Unclear what kind of quality you'll get out of it, but since the tokens are all local, kinda doesn't matter if it burns through 10x more for the same outcome.
I offhandedly set it up to do a weather alert every 4 hours during the big winter storm. Absent a well-specified API, I can only assume it was repeatedly doing a bunch of work to access some open API it discovered.
Very much the LLM equivalent of “to bake an apple pie you must first invent the universe”.
Hear hear. Elixir is a dream for this kind of stuff. But it requires very different decisions "all the way down" to make it work outside of BEAM. And BEAM itself feels heavy to most systems devs.
(IMO it's not for many use cases, and to the extent it is I'm happy to see things like AtomVM start to address it.)
This is a 'tipping point' situation. Exodus will be a little at a time, then all at once.
reply