Until there is some drastic new hardware, we are going to see a similar situation to proof of work, where a small group hordes the hardware and can collude on prices.
Difference is that the current prices have a lot of subsidies from OPM
Once the narrative changes to something more realistic, I can see prices increase across the board, I mean forget $200/month for codex pro, expect $1000/month or something similar.
So its a race between new supply of hardware with new paradigm shifts that can hit market vs tide going out in the financial markets.
For inference, there is already a 10x improvement possible over a setup based on NVIDIA server GPUs, but volume production, etc... will take a while to catch up.
During inference the model weights are static, so they can be stored in High Bandwidth Flash (HBF) instead of High Bandwidth Memory (HBM). Flash chips are being made with over 300 layers and they use a fraction of the power compared to DRAM.
NVIDIA GPUs are general purpose. Sure, they have "tensor cores", but that's a fraction of the die area. Google's TPUs are much more efficient for inference because they're mostly tensor cores by area, which is why Gemini's pricing is undercutting everybody else despite being a frontier model.
New silicon process nodes are coming from TSMC, Intel, and Samsung that should roughly double the transistor density.
There's also algorithmic improvements like the recently announced Google TurboQuant.
Not to mention that pure inference doesn't need the crazy fast networking that training does, or the storage, or pretty much anything other than the tensor units and a relatively small host server that can send a bit of text back and forth.
> Flash chips are being made with over 300 layers and they use a fraction of the power compared to DRAM.
Isn't reading from flash significantly more power intensive than reading DRAM? Anyway, the overhead of keeping weights in memory becomes negligible at scale because you're running large batches and sharding a single model over large amounts of GPU's. (And that needs the crazy fast networking to make it work, you get too much latency otherwise.)
For a given capacity of memory, Flash uses far less power than DRAM, especially when used mostly for reads.
> becomes negligible at scale
Nothing is negligible at scale! Both the cost and power draw of the HBMs is a limiting factor for the hyperscalers, to the point that Sam Altman (famously!) cornered the market and locked in something like 40% of global RAM production, driving up prices for everyone.
> sharding a single model over large amounts of GPUs
A single host server typically has 4-16 GPUs directly connected to the motherboard.
A part of the reason for sharding models between multiple GPUs is because their weights don't fit into the memory of any one card! HBF could be used to give each GPU/TPU well over a terabyte of capacity for weights.
Last but not least, the context cache needs to be stored somewhere "close" to the GPUs. Across millions of users, that's a lot of unique data with a high churn rate. HBF would allow the GPUs to keep that "warm" and ready to go for the next prompt at a much lower cost than keeping it around in DRAM and having to constantly refresh it.
> For a given capacity of memory, Flash uses far less power than DRAM, especially when used mostly for reads.
Flash has no idle power being non-volatile (whereas DRAM has refresh) but active power for reading a constantly-sized block is significantly larger for Flash. You can still use Flash profitably, but only for rather sparse and/or low-intensity reads. That probably fits things like MoE layers if the MoE is sparse enough.
Also, you can't really use flash memory (especially soldered-in HBF) for ephemeral data like the KV context for a single inference, it wears out way too quickly.
Modern flash memory, with multi-bit cells, indeed requires more power for reading than DRAM, for the same amount of data.
However, for old-style 1-bit per cell flash memory I do not see any reason for differences in power consumption for reading.
Different array designs and sense amplifier designs and CMOS fabrication processes can result in different power consumptions, but similar techniques can be applied to both kinds of memories for reducing the power consumption.
Of course, storing only 1 bit per cell instead of 3 or 4 reduces a lot the density and cost advantages of flash memory, but what remains may still be enough for what inference needs.
The basic physics of reading from Flash vs. DRAM are broadly similar, and it's true that reading from SLC flash is a bit cheaper, but you'll still need way higher voltages and reading times to read from flash compared to DRAM. It's not really the same.
Doubtful, local models are the competitive future that will keep prices down.
128GB is all you need.
A few more generations of hardware and open models will find people pretty happy doing whatever they need to on their laptop locally with big SOTA models left for special purposes. There will be a pretty big bubble burst when there aren't enough customers for $1000/month per seat needed to sustain the enormous datacenter models.
Apple will win this battle and nvidia will be second when their goals shift to workstations instead of servers.
Weird how you're leaving stuff like Strix Halo out. Also weird you think 128gb is the future with all of the research done to reduce that to something around 12GB being a target with all of these papers out now. I assume we'll end up with less general purpose models and more specific small ones swapped out for whatever work you are asking to do.
Batch inference is much more efficient. Using the hardware round the clock is much more efficient. Cloud can absolutely pay more for hardware and still make money off you.
Cloud can pay more for RAM until all the RAM producers withdraw from the consumer market, then prices will go back down.
End users will still get access to RAM. The cloud terminal they purchase from Apple, Google, Samsung, or HP will have all the RAM it will ever need directly soldered onto it.
Doesn’t Apple place RAM directly into the SoC package? We aren’t even talking about soldering it to mother boards anymore, it is coming in with the CPU like it would as a GPU.
The next step, I think, will be a "cash for clunkers" program to permit people to trade in old computer hardware to the government—especially since operating systems that do not collect KYC data on their users will soon be illegal to operate.
More like RAM producers are providing supplies to the highest bidder, no? If this doesn't peter out supply will normalize at a higher but less insane price eventually.
so I have to ask, what is the pricing? This solves a major annoyance of having to use Resend. I just want to be able to send some confirmation emails but I have to pay extra for each domain I add. I am ready to migrate to Cloudflare Email Service if the pricing is competitive.
It's next to impossible to get approved on Amazon SES now vs 10 years ago. I don't know why its so ridiculous difficult to use it for transactional verificaiton emails.
I see AWS screwing up and Cloudflare replacing it.
"a lot of East Asians" if you exclude China and those on the far left end of the political spectrum, you have the rest who adore Japan.
Unfortunately the anti-Japanese folks are the loudest in the room and loves to manipulate online opinions that that is the mainstream consensus.
They don't pause to ask, who benefits from this manufactured crisis that is aimed at driving a wedge between Korea and Japan in particular, both close US allies.
I think the hype around Qwen and even Gemma4 often floated for views/attention glosses over that these models have clear gaps behind what closed models offer.
In short, it has its uses but it would/should not be the main driver. Will it get better, I'm sure of it, but there is too much hype and exaggeration over open source models, for one the hardware simply isn't enough at a price point where we can run something that can seriously compete with today's closed models.
If we got something like GPT-5.4-xhigh that can run on some local hardware under 5k, that would be a major milestone.
I say "if we got $CURRENT_MODEL that can run under local hardware" claims are postproning BS.
What is gonna happen when that happens? They are gonna cry they need GPT-$CURRENT capabilities locally.
Now we have local models that are way better that GPT-2 (careful, this one is way too dangerous for release!) GPT3.5, in some ways better that 4, and can run on reasonably modest hardware.
As you demonstrated, AI is not needed to write slop, just because AI is involved doesn't make it slop. We are still very much in the control even if it is generation.
Clojure is more relevant than ever in post agentic coding because of immutability and the REPL. The two big problems with agentic coding is context growing in unbounded fashion, and agents being able to get quick feedback on what they're doing. Mainstream languages fail on both accounts. I've found Clojure has been a great fit for keeping agents on track.
It definitely is more "mainstream" than others but I just don't see the same level of attention and enthusiasm around it anymore. I'm sure it is still being used in many places but like Elixir, hiring remains on the tough end.
Hiring good talent was always problematic. This has nothing to do with the quality, capacity, and robustness of the language or its relevance. Hiring people who would love to use Clojure but have no prior experience is not that difficult - it's just that every company wants an expert, but they don't want to offer expert salaries. In places where they do, the competition is nuts, on top of that, experienced Clojuristas typically get interviewed and scrutinized with the same level of rigor as architects.
The industry should have optimized for hiring people interested in PLs like Clojure instead of LeetCode drillers. Clojure is rarely the first, second, or even third programming language people choose to learn. It demands a specific vision, dedication, and discipline that fundamentally transforms how people think about computation, data flow, distributed systems, and concurrency. The ROI from hiring an average developer experienced in Clojure has the potential to significantly exceed that of a typical hire. Even when there's zero Clojure in prod.
I've had Clojure on my resume for 10 years, mainly to see if anyone would ask about it. Nobody ever has, until an interview a couple days ago. We'll see if it actually helps in leading to an offer, I guess.
I have the opposite experience - been using Clojure for over a decade and it feels like only that mattered for the last five jobs. Even though it's really only just one of many layers that required to do the job. I honestly would love to find a non-clj team and convince them to use it. There are so many useful scripts we write in babaska alone, it just sounds wasteful not to use that path, fully knowing of its existence.
Imagine what you'd use random shell scripts, Makefile/Justfile or whatever "scripts" the language offers, if any, but written in Clojure instead, run with Babashka.
Anything that we previously used Bash or Python for - any complex task delegation from GHA; utility scripts for setting up proper ssh tunneling for various k8s clusters; there's pretty complex CLI tool we build for testing our services in ephemeral SDEs running our pods.
Personally: all my MCPs are written in Clojure - https://github.com/agzam/death-contraptions; I write small web-scraping scripts and automations in nbb with Playwright. The flexibility of starting the REPL and figuring out the rest of it dynamically, while poking through DOM elements directly from the editor is some blackmagicfuckery that should be outlawed. Imagine being able to control the state of the web browser while typing some incantations in your editor, without much ceremony, without some crazy scaffolding, without "frameworks", without even having to save the code into a file. You gotta be some ignorant fool who doesn't know this was at all possible or a complete idiot to say "meh, but all these parentheses". You gotta be kidding me. It's like if someone gave you a magic car attachment that makes it run for 800 miles on a single charge and you'd say: "meh, I don't like the color of it"...
Clojure had lousy error messages, agents deal with this well.
Clojure is capable of producing some of the most dense code I’ve ever seen, so manual code reviews really start to feel like a bottleneck unless your goal is to level up.
> Clojure is capable of producing some of the most dense code I’ve ever seen, so manual code reviews really start to feel like
For me it's the opposite, the dense code is easier to review, because the proposed changes are almost always smaller and more informative. Contrast a change in a typical TypeScript project where changes propagate across tens of files, that you need to jump in-between just to understand the context. In the time it takes me to ramp up understanding what the change is, I've already completed the review of a change in a Clojure program.
Not to mention that Clojurescript often emit safer code than Typescript does. Sounds insane and counter-intuitive, but here's the thing - Typescript actually removes all the type information from the emitted JS. Clojure, being strongly typed retains the strong typing guarantees in the compiled JS code. So all that enormous amount of effort required to deal with complex types, in practice feels like bringing kata choreography to a street fight - it's not utterly useless by itself, but hardly helping in a real fight-or-flight situation. You can impress the attacker with your beautiful dance and even prevent them from attacking you, but that's more like hope than a real strategy.
I would say dense code tends to help code reviews. It just is a bit unintuitive to spend minutes looking at a page of code when you are used to take a few seconds in more verbose languages.
I find it also easier to just grab the code and interactively play with it compared to do that with 40 pages of code.
I work on a large Clojure codebase with AI, and I'm getting excellent results. Likely factors are code density, the resulting token density, and a lot of well-architected code that the AI can follow — I'm not sure exactly, but the results are really good.
Yeah I see this at work - people are sceptical that AI + Clojure even works
but in my experience its amazing the overall quality of Clojure code in the wild tends to be higher than your typical language so AI's training on Clojure tends to be on modern and high quality code and the language is very token efficient, you can also tell AI to interact with the REPL to avoid restarts
The only downside I've seen reported is mis-matched parens but for me models have been strong enough to balance parens for about a year at least it's not something I actively work-around even though there are work-arounds like brepl and others
FWIW, I never get mis-matched parens when working with Claude Code in Clojure. I guess the key is to use clojure-mcp tools. If there ever is a mismatch during work (rarely, but I've seen it happen), claude fixes it using the tools right away.
I find it actually the best substrate to write AI tooling. All my custom MCPs are written in Clojure (bb). You hook up the agent to the REPL and let it go wild - it builds something nice. Also, Clojure is one of the most token efficient PLs.
One of the main problems I have with the models coding is the feedback loop is way down the chain from generation, it's out at the commit boundary for python when your hooks are running, maybe at the point where the model wants to push a PR. The REPL lets that happen during generation, and the other safety measures help immensely. Immutable data, STM, all of the features in Clojure that gave devs super powers now do the same for a model.
Ask AI to build something. By default it will use python. Sometimes js or typescript.
Ask then to do the same thing in Clojure. The result is generally an order of magnitude better. Shorter, easier to read for both humans and llm. Easier to adapt to changes.
Not to mention that changes are easier to review, because there is less of them. Same semantic diff for JavaScript and Clojure looks vastly different, and I'll favor to review Clojure changes any day of the week.
Difference is that the current prices have a lot of subsidies from OPM
Once the narrative changes to something more realistic, I can see prices increase across the board, I mean forget $200/month for codex pro, expect $1000/month or something similar.
So its a race between new supply of hardware with new paradigm shifts that can hit market vs tide going out in the financial markets.
reply