Hacker Newsnew | past | comments | ask | show | jobs | submit | ZeroCool2u's commentslogin

Zed is excellent. I know it's weird, but the last thing holding me back is being able to have a browser based Zed session the same as VSCode.

A crucial factor tech industry folks tend to ignore is how much executives value predictable costs. Cloud migrations got away with this, but still had to argue fiercely, because 'the cloud' and its serverless tech had the potential to significantly decrease overall spend for unpredictable, bursty workloads.

The usual counter-argument is the operational burden, but human capital is also a relatively fixed cost. A dedicated team of 3-5 FTEs could probably handle inference ops for a F500 company.

Meanwhile, the capability delta is shrinking fast. We have more evidence that local open-source is viable with the release of DeepSeek v4, and the industry is only trending further in this direction. Especially as we rely more on test-time compute and task-specific harnesses rather than model size.

So, if you're an executive looking at a marginal but fixed operations cost, added flexibility, and a rapidly closing gap in capability, why wouldn't you just run open-source models on your own infrastructure to get those highly predictable costs? Plus, you decrease the risk of one of the frontier


Do you really want to buy the 3rd or 4th most intelligent AI?

There’s so much uncertainty, it seems like the safe option is to give everyone a Claude or OpenAI subscription/api key until the frontier isn’t changing every six months.


Interestingly, Anthropic uses Mintlify for their docs. Not Stainless. Obviously, the focus is on SDK generation, but still strange.


Anthropic uses Stainless Docs for the API reference. It’s a custom integration that embeds the Stainless Docs react components directly in the Claude dashboard application.

(I worked on the Stainless Docs product at Stainless and implemented support for Anthropic’s embedding use case)


Anthropic technically use the Stainless docs platform for their docs, in that it’s all rendered by Stainless components. They just don’t use the full suite of Stainless tools for docs. The ability to use as little or as much as you like was a great feature of the Stainless docs product


It is frustrating, because I really enjoyed my Valve Index and want a replacement and Meta has some of the best VR tech in the world, but I've waited 6 years for Valve to release their new headset to buy a replacement, simply because Meta can't be trusted.


Well at least we know it’s coming.


Steam Frame soon™


There was a time where Google could've been competitive in this space, specifically against Apples MacBook product line, but that has long since passed. The 3rd party manufacturer path means Google isn't committed to this and won't have competitive hardware. It'll just be another Chromebook and limited to the Google Play Store too, which just isn't good at this point.


> and limited to the Google Play Store too, which just isn't good at this point.

Care to elaborate? I have no ide a what you're talking about here.


The quality of apps in the Google Play Store has dropped massively. There are still some gems, but for better or worse, the ecosystem is simply not as strong as Apples and it's certainly not comparable to just having a device where you can install anything you'd like in a full desktop grade OS.


Bedrock is both more expensive, less feature complete, and less reliable in terms of raw volume of 500 errors.


Interesting side effect of this is that Google Cloud may now be the only hype scaler that can resell all 3 of the labs models? Maybe I'm misinterpreting this, but that would be a notable development, and I don't see why Google would allow Gemini to be resold through any of the other cloud providers.

Might really increase the utility of those GCP credits.


Might not be good for Gemini long term if Anthropic and OpenAI can and will sell in every cloud provider they can find but businesses can only use Gemini via Google Cloud.


Good for Google Cloud, bad for Gemini = ??? for Google


Except Gemini might end up being far cheaper per token due to the infrastructure advantage


Do we have proof that it's cheaper in terms of $/token/intelligence?


I think the public pricing usually has it cheaper (relatively). Obviously since AI is constantly evolving it's not going to compare as favourably farther to a major Gemini release

I was mainly referring to the TPU hardware advantage + GCP running and designing their own datacenter stack.


Does TpU actually have an advantage over Nvidia GPUs?


How is it good for Gemini that it's not available on two out of three major cloud platforms?


It isn't. That's why I said "might not be good for Gemini".


Oof, I completely missed that "not", thanks.


"hype scaler" indeed!


that will likely mean the end of gemini models...


Benchmarks are favorable enough they're comparing to non-OpenAI models again. Interesting that tokens/second is similar to 5.4. Maybe there's some genuine innovation beyond bigger model better this time?


It's behind Opus 4.7 in SWE-Bench Pro, if you care about that kind of thing. It seems on-trend, even though benchmarks are less and less meaningful for the stuff we expect from models now.

Will be interesting to try.


Whenever we get the locally runnable 4k models things are going to get really awkward for the big 3 labs. Well at least Google will still have their ad revenue I guess.


Given how little claude usage they've been giving us on the "pro" plan lately, I've started doing more with the various open Qwen3.* models. Both Qwen3-coder-next and Qwen3.5-27b have been giving me good results and their 3.6 models are starting to be released. I think Anthropic may be shooting themselves in the foot here as more people start moving to local models due to costs and/or availability. Are the Qwen models as good as Claude right now? No. But they're getting close to as good as Claude sonnet was 9 months to a year ago (prior to 4.5, around 4.0). If I need some complex planning I save that for claude and have the Qwen models do the implementation.


I was thinking the exact same thing just now as I load up qwen3.6 into hermes agent and all while fantasizing that it will replace opus 4.7. It might not actually but seems like we're on the verge of that.

Lately I've been wondering too just how large these proprietary "ultra powerful frontier models" really are. It wouldn't shock me if the default models aren't actually just some kind of crazy MoE thing with only a very small number of active params but a huge pool of experts to draw from for world knowledge.


I've also been using the Qwen3.5-27B and the new Qwen3.6 locally, both at Q6. I don't agree that they're as good as pre-Opus Claude. I really like how much they can do on my local hardware, but we have a long way to go before we reach parity with even the pre-Opus Claude in my opinion.


I run Qwen 3.5 122B-A10B on my MacBook Pro, and in my experience its capability level for programming and code comprehension tasks is roughly that of Claude Sonnet 3.7. Honestly I find that pretty amazing, having something with capability roughly equivalent to frontier models of an year ago running locally on my laptop for free. I’m eager to try Qwen 3.6 122B-A10B when it’s released.


What hardware do you use? I want to experiment with running models locally.


OP’s Qwen3.6 27B Q6 seems to run north of 20GB on huggingface, and should function on an Apple Silicon with 32GB RAM. Smaller models work unreasonably well even on my M1/64GB MacBook.

I am getting 10tok/sec on a 27B of Qwen3.5 (thinking, Q4, 18GB) on an M4/32GB Mac Mini. It’s slow.

For a 9B (much smaller, non-thinking) I am getting 30tok/sec, which is fast enough for regular use if you need something from the training data (like how to use grep or Hemingways favorite cocktail).

I’m using LMStudio, which is very easy and free (beer).


Not who you asked, but I've got a Framework desktop (strix halo) with 128GB RAM. In linux up to about 112GB can be allocated towards the GPU. I can run Qwen3.5-122B (4-bit quant) quite easily on this box. I find qwen3-coder-next (80b param, MOE) runs quite well at about 36tok/sec. Qwen3.5-27b is a bit slower at about ~24tok/sec but that's a dense model.


Why don’t you do the planning yourself? It’s very likely to be a better plan.


They're not perfect but the local model game is progressing so quickly that they're impossible to ignore. I've only played around with the new qwen 3.6 models for a few minutes (it's damn impressive) but this weekend's project is to really put it through its paces.

If I can get the performance I'm seeing out of free models on a 6-year-old Macbook Pro M1, it's a sign of things to come.

Frontier models will have their place for 1) extensive integrations and tooling and 2) massive context windows. But I could see a very real local-first near future where a good portion of compute and inference is run locally and only goes to a frontier model as needed.


I've had really good results form qwen3-coder-next. I'm hoping we get a qwen3.6-coder soon since claude seems to get less-and-less available on the pro plan.


If the apple silicon keeps making the gains it makes, a mac studio with 128gb of ram + local models will be a practical all-local workflow by say 2028 or 2030. OpenAI and Anthropic are going to have to offer something really incredible if they want to keep subscription revenue from software developers in the near future, imo


In my experience Azure is full of consistency issues and race conditions. It's enough of an issue that I was talking about new OpenAI models becoming available via Bedrock on AWS and how convenient that was since I wouldn't have to deal with Azure and my colleague in enterprise architecture went on an unprompted rant about these exact issues. It's not the first time something like this has happened and I've experienced these issues first hand, so yes. I'd say reliability is a critical issue for Azure and it hasn't gotten better each time I've gone back to check.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: