I hope the industry starts competing more on highest scores with lowest tokens like this. It's a win for everybody. It means the model is more intelligent, is more efficient to inference, and costs less for the end user.
So much bench-maxxing is just giving the model a ton of tokens so it can inefficiently explore the solution space.
It could be. Or just smarter caching (which wouldn't necessarily have to do with model intelligence). Or just overfitting on the 95% most common prompts (which could save tokens but make the models less intelligent/flexible).
With AGI we expect a huge return on investment and a GDP growth that could be accelerating at a rate we couldn't even comprehend. Imagine an algorithm that improves itself each iteration and finds ways to increase its capacity every day. Robots suddenly capable of doing dishes, grocery shopping, picking produce from the field. Imagine all your ailments handled... age becomes just a number.
Also with AGI we expect a winner take all situation. The first AGI system would protect itself against any other AGI system. Hence why it's go time for all these AI companies and why they stopped sharing their research.
I've seen plenty of people look at those metrics and they certainly do tell a story of growing inequality and instability. To me, it seems more obvious that those issues are largely unaddressed by the people in power because they're more concerned with growing their wealth than taking care of their people. I suspect that's obvious to Americans given their overwhelming distrust for institutions, politicians, etc. Unfortunately Americans seem to lack the ability to discern who actually cares about them. By seeking change we've ironically bolstered the opposition to our basic human needs.
Well to be fair they only kind of quasi-exist for consumers right now. As far as I can tell Crucial was providing the only consumer accessible lpcamm2 modules on the market right now.
Crucial is certainly the only option that comes up looking on amazon or newegg right now. Lenovo has some OEM modules but they are obviously marketed as replacement parts to just their laptops, not sure how the warranty and support for them would be outside a lenovo product.
But the Crucial brand was unceremoniously sacrificed by Micron to the AI gods at the beginning of this year. So will these lpcamm2 modules even be available once current stock runs out? The 64gb module is already sitting at $1000 on newegg.
Samsung is making lpcamm2 modules but no telling when those will actually hit the market and be accessible.
I was surprised by the bit about Costco selling the outlet-tier trash. I don't currently have a membership, but I've generally understood their position to be quality at cost.
Costco has the same sourcing issues a lot of other companies have, they do spend a lot of effort on due diligence when sourcing Kirkland Signature brand products, but sometimes what they get in a production run ends up being something else, or there's unexpected issues with the products once the public gets them.
They are but the IDE needs to be integrated with them.
Qwen specifically calls out FIM (“fill in the middle”) support on the model card and you can see it getting confused and posting the control tokens in the example here.
And even of those models trained for tool calling and agentic flows, mileage may vary depending on lots of factors. Been playing around with smaller local models (Anything that fits on 4090 + 64gb RAM) and it is a lottery it seems on a) if it works at all and b) how long it will work for.
Sometimes they don't manage any tool calls and fall over off the bat, other times they manage a few tool calls and then start spewing nonsense. Some can manage sub agents fr a while then fall apart.. I just can't seem to get any consistently decent output on more 'consumer/home pc' type hardware. Mostly been using either pi or OpenCode for this testing.
I think the unfortunate truth is the simplest. Web development has long been detached from rationality. People are drawn to complexity like moths to a flame.
> People are drawn to complexity like moths to a flame.
Not to complexity, but to abstraction. The more something is abstracted away, the more fungible "developers" become, to the eventual tune of Claude Code.
No one cares that trying to debug a modern application is as hellish as its performance, the KPI that executives go for is employment budget.
It might be really efficient when you "vibe" and don't know exactly what you want.
On serious projects, it feels like even Claude Code could be more efficient with simple technologies, providing near-instant build and debug.
With reduced abstractions and output looking like input, it can better understand how to fix things rather than trying to guess how to manipulate framework state or injecting hacks.
I don't know if Next.js, TanStack, etc are more abstract than Rails, Django, etc. They're undoubtedly more complex though. I also find it hard to believe that it's some sort of conspiracy by management to make developers more fungible. I've seen plenty of developers choose complexity with no outside pressure.
Qwen3-coder-next is way worse than Sonnet 4.5. Also, despite he lack of "coder" in the name Qwen3.5 is much better at coding than Qwen3-coder-next so you might want to check that out.
I don't know how well it performs, but you can extend Qwen3.5 to 1 million token context using YaRN. Also, Nemotron 3 Super was recently released and scales up to 1 million token context natively.
These prices are insane. You can buy all (most?) of the lenses they’re recreating for a fraction of the price and adapt them to a mirrorless camera no problem. I bought a Helios 44-2 recently for $100 and adapted it to my camera for like $15.
There's a difference between adapting for mirrorless versus adapting for cinema. They're not just throwing on an adapter to change the distance to the focal plane, they're actually rehousing the lens. Usually that means adding a de-clicked aperture and reducing focus breathing. These are all primes, but cine lenses are usually parfocal as well.
To your point, none of those things are important if you're just a regular consumer and taking stills, but they're all really nice to have/important if you're working on a film.
> There's a difference between adapting for mirrorless versus adapting for cinema.
The article says they’re adapting to mirrorless cameras
> As reported by CineD, the new Air series of lenses is designed to cater to the growing number of filmmakers who are using compact, lightweight mirrorless bodies for high-end professional work.
> The IronGlass Air lenses move away from IronGlass’ standard PL-mount cinema design toward compact, mirrorless-friendly designs
It doesn’t matter what they’re being put on. I put cinema lenses on Red’s, DSLR’s (5DmkII/t3i back in the day), mirrorless (GH6/BMPCC4K), the works. “Cinema lens” indicates a build type, not what they can mount to. Like the declicked aperture the previous person mentioned.
For instance, Rokinon released a fantastic cinema lens line for consumer/prosumer cameras in the 2010’s, they were rehoused versions of their photo primes. They’re built entirely different.
Sure there's a difference, but when you can buy the base lens for $100 and 3D print your own cinema housing for $50, it becomes a lot more odious that IronGlass is charging what amounts to around $2100USD for a metal rehousing that is at most $50 worth of materials and a few hours work per lens.
The "Cinema" industry is notorious for gouging its customers, and this just yet another particularly egregious example of that gouging.
I spent some time trying to understand this paper and I think calling this a new attention mechanism is a bit misleading. As a dead comment pointed out this is much closer to RAG. It's not exposing all 100M tokens directly to the model while doing each prediction. However, the RAG mechanisms have been integrated directly into the model architecture and that means it can have higher accuracy and lower latency. The higher accuracy is because it isn't storing text, but rather the actual in-memory representations (K/V, compressed tensor representations, routing keys, etc) of each document so it can search and utilize them more effectively. Given that it's computing up to 100x the context space it, like RAG, cannot process that volume in realtime. They explicitly state the the model needs to do offline encoding before handling inference. So you shouldn't expect to just send 100M tokens over an API and start getting a response.
I also think some of the benchmarks are misleading. Getting a RAG system to do an attention benchmark and then comparing it against a model without RAG just isn't fair. It is obviously better but it's not apples to apples. Some of the benchmarks compare against model+RAG and there the delta in performance is much smaller.
So much bench-maxxing is just giving the model a ton of tokens so it can inefficiently explore the solution space.
reply