More

sosodev · 2026-04-23T18:30:33 1776969033

I hope the industry starts competing more on highest scores with lowest tokens like this. It's a win for everybody. It means the model is more intelligent, is more efficient to inference, and costs less for the end user.

So much bench-maxxing is just giving the model a ton of tokens so it can inefficiently explore the solution space.

an0malous · 2026-04-23T18:38:27 1776969507

The premise of the trillion dollars in AI investments is not that it’ll be as good as it currently is but cheaper. It’s AGI or bust at this point.

sosodev · 2026-04-23T18:44:06 1776969846

Yeah, but don’t you agree that less tokens to accomplish the same goal is a sign of increasing intelligence?

camdenreslink · 2026-04-23T20:21:04 1776975664

It could be. Or just smarter caching (which wouldn't necessarily have to do with model intelligence). Or just overfitting on the 95% most common prompts (which could save tokens but make the models less intelligent/flexible).

energy123 · 2026-04-23T19:27:26 1776972446

Less cost to accomplish the same goal is a sign of intelligence. That's not necessarily achieved with less tokens but it may be.

mchusma · 2026-04-23T19:06:43 1776971203

Kind of? But I really care about price speed and quality. If it used 10x tokens at 1/10th the tokens and same latency I would be neutral on it.

Kimmi 2.6 for example seems to throw more tokens to improve performance (for better or worse)

dcre · 2026-04-24T01:40:15 1776994815

Why is AGI required to make the investments work out?

xutopia · 2026-04-24T02:45:16 1776998716

With AGI we expect a huge return on investment and a GDP growth that could be accelerating at a rate we couldn't even comprehend. Imagine an algorithm that improves itself each iteration and finds ways to increase its capacity every day. Robots suddenly capable of doing dishes, grocery shopping, picking produce from the field. Imagine all your ailments handled... age becomes just a number.

Also with AGI we expect a winner take all situation. The first AGI system would protect itself against any other AGI system. Hence why it's go time for all these AI companies and why they stopped sharing their research.

dcre · 2026-04-24T16:05:21 1777046721

This does not answer my question.

sosodev · 2026-04-23T17:56:24 1776966984

I've seen plenty of people look at those metrics and they certainly do tell a story of growing inequality and instability. To me, it seems more obvious that those issues are largely unaddressed by the people in power because they're more concerned with growing their wealth than taking care of their people. I suspect that's obvious to Americans given their overwhelming distrust for institutions, politicians, etc. Unfortunately Americans seem to lack the ability to discern who actually cares about them. By seeking change we've ironically bolstered the opposition to our basic human needs.

sosodev · 2026-04-21T18:37:04 1776796624

TIL LPCAMM2 exists. What an awesome solution to allow memory replacements while meeting all of the other requirements for laptops.

Spunkie · 2026-04-21T21:54:48 1776808488

Well to be fair they only kind of quasi-exist for consumers right now. As far as I can tell Crucial was providing the only consumer accessible lpcamm2 modules on the market right now.

Crucial is certainly the only option that comes up looking on amazon or newegg right now. Lenovo has some OEM modules but they are obviously marketed as replacement parts to just their laptops, not sure how the warranty and support for them would be outside a lenovo product.

But the Crucial brand was unceremoniously sacrificed by Micron to the AI gods at the beginning of this year. So will these lpcamm2 modules even be available once current stock runs out? The 64gb module is already sitting at $1000 on newegg.

Samsung is making lpcamm2 modules but no telling when those will actually hit the market and be accessible.

kube-system · 2026-04-21T18:57:09 1776797829

It doesn't meet all of them. AMD considered it for Strix Halo but said it didn't meet their latency requirements.

wmf · 2026-04-21T19:36:24 1776800184

LPCAMM2 latency should be the same as soldered. The problem is that Strix Halo wants a L-shaped memory layout and LPCAMMs are straight.

sosodev · 2026-04-21T17:25:31 1776792331

I was surprised by the bit about Costco selling the outlet-tier trash. I don't currently have a membership, but I've generally understood their position to be quality at cost.

kayfox · 2026-04-22T17:26:08 1776878768

Costco has the same sourcing issues a lot of other companies have, they do spend a lot of effort on due diligence when sourcing Kirkland Signature brand products, but sometimes what they get in a production run ends up being something else, or there's unexpected issues with the products once the public gets them.

sosodev · 2026-04-16T14:44:14 1776350654

These are not autocomplete models. It’s built to be used with an agentic coding harness like Pi or OpenCode.

zackangelo · 2026-04-16T15:00:42 1776351642

They are but the IDE needs to be integrated with them.

Qwen specifically calls out FIM (“fill in the middle”) support on the model card and you can see it getting confused and posting the control tokens in the example here.

sosodev · 2026-04-16T15:14:20 1776352460

Oh, that’s interesting. Thanks for the correction. I didn’t know such heavily post trained models could still do good ol fashion autocomplete.

JokerDan · 2026-04-16T16:28:40 1776356920

And even of those models trained for tool calling and agentic flows, mileage may vary depending on lots of factors. Been playing around with smaller local models (Anything that fits on 4090 + 64gb RAM) and it is a lottery it seems on a) if it works at all and b) how long it will work for.

Sometimes they don't manage any tool calls and fall over off the bat, other times they manage a few tool calls and then start spewing nonsense. Some can manage sub agents fr a while then fall apart.. I just can't seem to get any consistently decent output on more 'consumer/home pc' type hardware. Mostly been using either pi or OpenCode for this testing.

sosodev · 2026-04-08T17:54:49 1775670889

I think the unfortunate truth is the simplest. Web development has long been detached from rationality. People are drawn to complexity like moths to a flame.

mschuster91 · 2026-04-08T21:25:49 1775683549

> People are drawn to complexity like moths to a flame.

Not to complexity, but to abstraction. The more something is abstracted away, the more fungible "developers" become, to the eventual tune of Claude Code.

No one cares that trying to debug a modern application is as hellish as its performance, the KPI that executives go for is employment budget.

cryptonym · 2026-04-09T07:18:00 1775719080

It might be really efficient when you "vibe" and don't know exactly what you want.

On serious projects, it feels like even Claude Code could be more efficient with simple technologies, providing near-instant build and debug. With reduced abstractions and output looking like input, it can better understand how to fix things rather than trying to guess how to manipulate framework state or injecting hacks.

sosodev · 2026-04-08T21:51:46 1775685106

I don't know if Next.js, TanStack, etc are more abstract than Rails, Django, etc. They're undoubtedly more complex though. I also find it hard to believe that it's some sort of conspiracy by management to make developers more fungible. I've seen plenty of developers choose complexity with no outside pressure.

gherkinnn · 2026-04-09T04:13:52 1775708032

Next certainly feels more complex than Laravel or Rails while only providing most of the view layer and a client-server protocol based on React.

You're still left alone with i18n, auth, and pretty much anything to do with the backend, all of which the Rails of this world have you covered.

sosodev · 2026-04-02T18:56:29 1775156189

Qwen3-coder-next is way worse than Sonnet 4.5. Also, despite he lack of "coder" in the name Qwen3.5 is much better at coding than Qwen3-coder-next so you might want to check that out.

sosodev · 2026-04-02T15:44:42 1775144682

I don't know how well it performs, but you can extend Qwen3.5 to 1 million token context using YaRN. Also, Nemotron 3 Super was recently released and scales up to 1 million token context natively.

sosodev · 2026-03-30T23:22:51 1774912971

These prices are insane. You can buy all (most?) of the lenses they’re recreating for a fraction of the price and adapt them to a mirrorless camera no problem. I bought a Helios 44-2 recently for $100 and adapted it to my camera for like $15.

realslimjd · 2026-03-31T00:22:42 1774916562

There's a difference between adapting for mirrorless versus adapting for cinema. They're not just throwing on an adapter to change the distance to the focal plane, they're actually rehousing the lens. Usually that means adding a de-clicked aperture and reducing focus breathing. These are all primes, but cine lenses are usually parfocal as well.

To your point, none of those things are important if you're just a regular consumer and taking stills, but they're all really nice to have/important if you're working on a film.

Aurornis · 2026-03-31T00:31:32 1774917092

> There's a difference between adapting for mirrorless versus adapting for cinema.

The article says they’re adapting to mirrorless cameras

> As reported by CineD, the new Air series of lenses is designed to cater to the growing number of filmmakers who are using compact, lightweight mirrorless bodies for high-end professional work.

> The IronGlass Air lenses move away from IronGlass’ standard PL-mount cinema design toward compact, mirrorless-friendly designs

Forgeties79 · 2026-03-31T01:54:32 1774922072

It doesn’t matter what they’re being put on. I put cinema lenses on Red’s, DSLR’s (5DmkII/t3i back in the day), mirrorless (GH6/BMPCC4K), the works. “Cinema lens” indicates a build type, not what they can mount to. Like the declicked aperture the previous person mentioned.

For instance, Rokinon released a fantastic cinema lens line for consumer/prosumer cameras in the 2010’s, they were rehoused versions of their photo primes. They’re built entirely different.

rfwhyte · 2026-03-31T14:36:16 1774967776

Sure there's a difference, but when you can buy the base lens for $100 and 3D print your own cinema housing for $50, it becomes a lot more odious that IronGlass is charging what amounts to around $2100USD for a metal rehousing that is at most $50 worth of materials and a few hours work per lens.

The "Cinema" industry is notorious for gouging its customers, and this just yet another particularly egregious example of that gouging.

sosodev · 2026-03-24T22:10:54 1774390254

I spent some time trying to understand this paper and I think calling this a new attention mechanism is a bit misleading. As a dead comment pointed out this is much closer to RAG. It's not exposing all 100M tokens directly to the model while doing each prediction. However, the RAG mechanisms have been integrated directly into the model architecture and that means it can have higher accuracy and lower latency. The higher accuracy is because it isn't storing text, but rather the actual in-memory representations (K/V, compressed tensor representations, routing keys, etc) of each document so it can search and utilize them more effectively. Given that it's computing up to 100x the context space it, like RAG, cannot process that volume in realtime. They explicitly state the the model needs to do offline encoding before handling inference. So you shouldn't expect to just send 100M tokens over an API and start getting a response.

I also think some of the benchmarks are misleading. Getting a RAG system to do an attention benchmark and then comparing it against a model without RAG just isn't fair. It is obviously better but it's not apples to apples. Some of the benchmarks compare against model+RAG and there the delta in performance is much smaller.