More

finnjohnsen2 · 2026-04-22T21:07:02 1776892022

Since Gemma 4 came this easter the gap from self hosting models to Claude has decreased sigificantly I think. The gap is still huge it just that local models were extremely non-competitive before easter. So now it seems Qwen 3.6 is another bump up from Gemma 4 which is exciting if it is so. I keep an Opus close ofcourse, because these local models still wander off in the wrong direction and fails. Something Opus almost never does for me anymore.

But every time a local model gets me by - I feel closer to where I should be; writing code should still be free. Both free as in free beer, and free as in freedom.

My setup is a seperate dedicated Ubuntu machine with RTX 5090. Qwen 3.6:27b uses 29/32gb of vram when its working right this minute. I use Ollama in a non root podman instance. And I use OpenCode as ACP Service for my editor, which I highly recommend. ACP (Agent Client Protocol) is how the world should be in case you were asking, which you didnt :)

Exciting times and thank you Qwen team for making the world a better place in a world of Sam Altmans.

djfergus · 2026-04-22T22:45:24 1776897924

>> I feel closer to where I should be; writing code should still be free. Both free as in free beer, and free as in freedom.

I’m just pleased by the competition, agree with the ideal of free and local but sustainable competition is key: driving $200 p/m down to a much much lower number.

datadrivenangel · 2026-04-22T21:47:39 1776894459

Gemma4 feels the most "claude-like" of all the models I've run locally on my M5 mbp.

chr15m · 2026-04-22T22:49:55 1776898195

I found on coding tasks that Qwen 3.5 can actually do the thing whereas Gemma 4 went off the rails frequently. Will try this new 3.6 release today.

da-x · 2026-04-23T12:23:56 1776947036

I use Qwen 3.5 122B on an RTX PRO 6000 with open code, and very pleased. I don't feel a need for using a closed model any more. The result after answering questions in Plan mode is almost always what I want, with very few occasional bugs. It does a lot of effort to see how the code I am working on is written now while extending it in the same style.

If they release a Qwen 3.6 that also makes good use of the card, may move to it.

verdverm · 2026-04-23T00:05:42 1776902742

There was a qwen-3.6 MoE six days ago that I thought was better than Gemma 4. Today's is a dense model. (gemma release both a 26B MoE and a 31B dense at the same time)

I have intention to evaluate all four on some evals I have, as long as I don't get squirrelled again.

djyde · 2026-04-22T23:01:08 1776898868

What level of programming tasks can a 27B model handle? Even with Claude, I'm occasionally not satisfied, and I can't imagine how effective a 27B model would be.

sleepyeldrazi · 2026-04-23T09:27:06 1776936426

I ran 3 prompts (short versions, full version in the repo):

- Implement a numerically stable backward pass for layer normalization from scratch in NumPy.

- Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).

- Implement an efficient KV-cache system for autoregressive transformer inference from scratch.

and tested Qwen3.6-27B (IQ4_NL on a 3090) against MiniMax-M2.7 and GLM-5 with kimi k2.6 as the judge (imperfect, i know, it was 2AM). Qwen surpassed minimax and won 2/3 of the implementations again GLM-5 according to kimi k2.6, which still sounds insane to me. The env was a pi-mono with basic tools + a websearch tool pointing to my searxng (i dont think any of the models used it), with a slightly customized shorter system prompt. TurboQuant was at 4bit during all qwen tests. Full results https://github.com/sleepyeldrazi/llm_programming_tests.

I am also periodically testing small models in a https://www.whichai.dev style task to see their designs, and qwen3.6 27B also obliterated (imo) the other ones I tested https://github.com/sleepyeldrazi/llm-design-showcase .

Needless to say those tests are non-exhaustive and have flaws, but the trend from the official benchmarks looks like is being confirmed in my testing. If only it were a little faster on my 3090, we'll see how it performs once a DFlash for it drops.

__s · 2026-04-23T00:43:50 1776905030

Basic triage is good. I've found I need to mostly handle programming, but local models have been good for pointing me at where to look with just "investigate https://github.com/HarbourMasters/Shipwright/issues/6232" as prompt

justinclift · 2026-04-23T01:07:21 1776906441

> Qwen 3.6:27b uses 29/32gb of vram

What context size are you using for that?

Btw, are you using flash attention in Ollama for this model? I think it's required for this model to operate ok.

skirmish · 2026-04-23T03:29:28 1776914968

I squeezed it into 24 GiB VRAM (since I have RX7900XTX):

-- Q5_K_M Unsloth quantization on Linux llama.cpp

-- context 81k, flash attention on, 8-bit K/V caches

-- pp 625 t/s, tg 30 t/s

rsolva · 2026-04-24T22:05:26 1777068326

I have the same GPU and get very good results, even better than Gemma 4 26B A4B, using the following setup (Fedora 43 Silverblue, podman compose):

  services:
    llama:
      image: ghcr.io/ggml-org/llama.cpp:server-vulkan
      container_name: llama-qwen3.6-27b-dense
      ports:
        - 4201:8080
      volumes:
        - ./Qwen3.6-27B-Q4_K_M.gguf:/models/model.gguf:ro,z
        - ./mmproj-BF16.gguf:/models/mmproj.gguf:ro,z
      devices:
        - /dev/dri
      group_add:
        - video
      command: >
        -m /models/model.gguf
        --mmproj /models/mmproj.gguf
        --alias "Qwen3.6 27b Dense"
        -ngl 99
        -c 98304
        -b 2048
        --host 0.0.0.0
        --port 8080
        --parallel 2
        --kv-unified
        --ubatch-size 2048
        --flash-attn on
        -cb
        --jinja
        --no-webui
        -ctk q8_0
        -ctv q8_0
        --image-min-tokens 1024
        --temp 0.6
        --top-k 20
        --top-p 0.95
        --repeat-penalty 1
        --presence-penalty 1.5
        --reasoning auto
      restart: unless-stopped

tgtweak · 2026-04-23T03:31:24 1776915084

Depends entirely on quantization. Q6_K with max context length (262144) is ~40GB of VRAM.

Q8 with the same context wouldn't fit in 48GB of VRAM, it did with 128k of context.

pawelduda · 2026-04-22T22:35:18 1776897318

How many tokens/s do you get on RTX 5090?

gfosco · 2026-04-23T01:25:11 1776907511

I set this up today on my 5090 at Q6_K quantization and Q4_0 KV, got 50 tokens/s consistently at 123k context, using ~28/32gb vram through LM Studio.

pawelduda · 2026-04-23T11:15:33 1776942933

Wow, that sounds usable. I know it's anecdotal but how did you find the quality of the output, and can you compare it to any closed source model?

girvo · 2026-04-23T03:05:20 1776913520

Not that you asked but I’m getting ~20 tokens/s on my DGX Spark (Asus actually) using an Int4 AutoRound quant, MTP 1 and some other tricks

overgard · 2026-04-22T22:58:39 1776898719

Can't answer for an RTX 5090, but for an RTX 5080 16GB of RAM (desktop), I get about 6 tokens/sec after some tweaking (f16->q4_0). Kind of on the borderline of usable.. probably realistically need either a 5090 with more RAM or something like a Mac with a unified memory architecture.

datadrivenangel · 2026-04-22T23:38:19 1776901099

My M5 Pro is getting ~11 tokens per second via OMLX for an 8 bit quant.

angoragoats · 2026-04-23T00:40:05 1776904805

A Mac is not going to be all that much faster than a 5080 with any models, other than the ones you can’t currently run at all because you don’t have enough GPU+CPU memory combined.

You’re much better off adding a second GPU if you’ve already got a PC you’re using.

finnjohnsen2 · 2026-04-14T19:41:18 1776195678

The simulation says you're welcome. Haha, just kidding. This is not a simulation obviously.

finnjohnsen2 · 2026-04-01T05:58:20 1775023100

TIL: Germany (85m) has almost the same population as Iran (90m)

fransje26 · 2026-04-01T10:46:06 1775040366

> Iran has 90,000,000 people. [..] More than 2x Germany.

TIL: Germany has less than 45m inhabitants. Less than Spain! /s

finnjohnsen2 · 2026-03-26T17:23:32 1774545812

Geopolitics is a reason. Many individuals and companies are scrambling for safe alternatives to US tech. I live in Norway and there is a lot of this going on.

finnjohnsen2 · 2026-02-24T14:18:25 1771942705

Sir, your logic is flawed. You can buy Adidas in the US too. Adidas is still 0% american.

finnjohnsen2 · 2026-02-19T21:58:11 1771538291

I like this. This is an accurate state of AI at this very moment for me. The LLM is (just) a tool which is making me "amplified" for coding and certain tasks.

I will worry about developers being completely replaced when I see something resembling it. Enough people worry about that (or say it to amp stock prices) -- and they like to tell everyone about this future too. I just don't see it.

DrewADesign · 2026-02-19T22:00:42 1771538442

Amplified means more work done by fewer people. It doesn’t need to replace a single entire functional human being to do things like kill the demand for labor in dev, which in turn, will kill salaries.

finnjohnsen2 · 2026-02-19T22:05:19 1771538719

I would disagree. Amplified meens me and you get more s** done.

Unless there a limited amount of software we need to produce per year globally to keep everyone happy, then nobody wants more -- and we happen to be at that point right NOW this second.

I think not. We can make more (in less time) and people will get more. This is the mental "glass half full" approach I think. Why not take this mental route instead? We don't know the future anyway.

DrewADesign · 2026-02-19T23:46:25 1771544785

In fact, there isn’t infinite demand for software. Especially not for all kinds of software.

And if corporate wealth means people get paid more, why are companies that are making more money than ever laying off so many people? Wouldn’t they just be happy to use them to meet the inexhaustible demand for software?

jimbokun · 2026-02-20T04:23:35 1771561415

I do wonder though if we have about enough (or too much) software.

I hear people complaining about software being forced on them to do things they did just fine without software before, than people complaining about software they want that doesn’t exist.

dasil003 · 2026-02-20T06:48:14 1771570094

Yeah I think being annoyed by software is far more prevalent than wishing for more software. That said, I think there is still a lot of room for software growth as long as it's solving real problems and doesn't get in people's way. What I'm not sure about is what will the net effect of AI be overall when the dust settles.

On one hand it is very empowering to individuals, and many of those individuals will be able to achieve grander visions with less compromise and design-by-committee. On the other hand, it also enables an unprecedented level of slop that will certainly dilute the quality of software overall. What will be the dominant effect?

kiba · 2026-02-19T22:09:27 1771538967

Jevon's paradox means this is untrue because it means more work not less.

jimbokun · 2026-02-20T04:24:29 1771561469

Jevon’s Paradox is an important observation but I don’t think it’s an immutable law of the universe,

topocite · 2026-02-20T12:15:04 1771589704

It is a 19th century economic observation around the use of coal.

It is like saying the PDF is going to be good for librarian jobs because people will read more. It is stupid. It completely breaks down because of substitution.

Farming is the most obvious comparison to me in this. Yes, there will be more food than ever before, the farmer that survives will be better off than before by a lot but to believe the automation of farming tasks by machines leads to more farm jobs is completely absurd.

inglor_cz · 2026-02-19T22:24:07 1771539847

Hm. More of what? Functionality, security, performance?

Current software is often buggy because the pressure to ship is just too high. If AI can fix some loose threads within, the overall quality grows.

Personally, I would welcome a massive deployment of AI to root out various zero-days from widespread libraries.

But we may instead get a larger quantity of even more buggy software.

emp17344 · 2026-02-19T22:19:08 1771539548

This is incorrect. It’s basic economics - technology that boosts productivity results in higher salaries and more jobs.

DrewADesign · 2026-02-19T23:41:37 1771544497

That’s not basic economics. Basic economics says that salaries are determined by the demand for labor vs the supply of labor. With more efficiency, each worker does more labor, so you need fewer people to accomplish the same thing. So unless the demand for their product increases around the same rate as productivity increases, companies will employ fewer people. Since the market for products is not infinite, you only need as much labor as you require to meet the demand for your product.

Companies that are doing better than ever are laying people off by the shipload, not giving people raises for a job well done.

gorjusborg · 2026-02-19T22:27:36 1771540056

Well, that depends on whether the technology requires expertise that is rare and/or hard to acquire.

I'd say that using AI tools effectively to create software systems is in that class currently, but it isn't necessarily always going to be the case.

topocite · 2026-02-20T12:25:55 1771590355

You obviously haven't thought about economics much at all to say something this simplistic.

There are so many counter examples of this being wrong that it is not even worth bothering.

I love economics, but it is largely a field based around half truths and intellectual fraud. It is actually why it is an interesting subject to study.

emp17344 · 2026-02-20T19:11:38 1771614698

Denial of economic truths is denial of science. Not sure what to tell you. What parts do you reject?

DrewADesign · 2026-02-21T18:22:02 1771698122

Like denying that more efficiency without a commensurate increase in product demand means the demand for labor goes down, which means fewer jobs, and lower salaries? You don’t pay people what they’re actually worth, you pay people what they’ll work for. Requesting more money because you’re making the company more money is only viable if there aren’t qualified people lining up for the chance to take your role. Even without more money, well-paid people tend to regrettably get laid off in those circumstances.

jimbokun · 2026-02-20T04:25:44 1771561544

Nah, most of it just gets returned to capital holders.

cogman10 · 2026-02-19T22:01:01 1771538461

The more likely outcome is that fewer devs will be hired as fewer devs will be needed to accomplish the same amount of output.

HPsquared · 2026-02-19T22:15:51 1771539351

The old shrinking markets aka lump of labour fallacy. It's a bit like dreaming of that mythical day, when all of the work will be done.

cogman10 · 2026-02-19T22:21:32 1771539692

No it's not that.

Tell me, when was the last time you visited your shoe cobbler? How about your travel agent? Have you chatted with your phone operator recently?

The lump labour fallacy says it's a fallacy that automation reduces the net amount of human labor, importantly, across all industries. It does not say that automation won't eliminate or reduce jobs in specific industries.

It's an argument that jobs lost to automation aren't a big deal because there's always work somewhere else but not necessarily in the job that was automated away.

imiric · 2026-02-19T23:50:34 1771545034

Jobs are replaced when new technology is able to produce an equivalent or better product that meets the demand, cheaper, faster, more reliably, etc. There is no evidence that the current generation of "AI" tools can do that for software.

There is a whole lot of marketing propping up the valuations of "AI" companies, a large influx of new users pumping out supremely shoddy software, and a split in a minority of users who either report a boost in productivity or little to no practical benefits from using these tools. The result of all this momentum is arguably net negative for the industry and the world.

This is in no way comparable to changes in the footwear, travel, and telecom industries.

danny_codes · 2026-02-20T02:31:20 1771554680

I was with you till like a month ago. Now I’m not so sure..

9rx · 2026-02-20T04:11:38 1771560698

Current generation "AI" has already largely solved cheaper, faster, and more reliable. But it hasn't figured out how to curb demand. So far, the more software we build, the more people want even more software. Much like is told in the lump of labor fallacy, it appears that there is no end to finding productive uses for software. And certainly that has been the "common wisdom" for at least the last couple of decades; that whole "software is eating the world" thing.

What changed in the last month that has you thinking that a demand wall is a real possibility?

danny_codes · 2026-02-21T06:18:07 1771654687

I agree the pie can grow, but I don’t know that the profession survives in its current form. Whether the next form is personally profitable for those of us who’ve sunk a decade+ into the SWE skillset remains to be seen.

I selfishly hope it is, but imo it’s simply to early to tell.

NewEntryHN · 2026-02-20T11:09:52 1771585792

This implication completely depends on the elasticity (or lack thereof) of demand for software. When marginal profit from additional output exceeds labor cost savings, firms expand rather than shrink.

slopinthebag · 2026-02-19T22:37:08 1771540628

When computers came onto the market and could automate a large percentage of office jobs, what happened to the job market for office jobs?

cogman10 · 2026-02-19T22:43:37 1771541017

They changed, significantly.

We lost the pneumatic tube [1] maintenance crew. Secretarial work nearly went away. A huge number of bookkeepers in the banking industry lost their jobs. The job a typist was eliminated/merged into everyone else's job. The job of a "computer" (someone that does computations) was eliminated.

What we ended up with was primarily a bunch of customer service, marketing, and sales workers.

There was never a "office worker" job. But there were a lot of jobs under the umbrella of "office work" that were fundamentally changed and, crucially, your experience in those fields didn't necessarily translate over to the new jobs created.

[1] https://www.youtube.com/watch?v=qman4N3Waw4

slopinthebag · 2026-02-19T22:45:16 1771541116

I expect something like this will happen to some degree, although not to the extent of what happened with computers.

But the point is that we didn't just lose all of those jobs.

cogman10 · 2026-02-19T22:51:13 1771541473

Right, and my point is that specific jobs, like the job of a dev, were eliminate or significantly curtailed.

New jobs may be waiting for us on the other side of this, but my job, the job of a dev, is specifically under threat with no guarantee that the experience I gained as a dev will translate into a new market.

slopinthebag · 2026-02-19T23:03:05 1771542185

I think as a dev if you're just gluing API's together or something akin to that, similar to the office jobs that got replaced, you might be in trouble, but tbh we should have automated that stuff before we got AI. It's kind of a shame it may be automated by something not deterministic tho.

But like, if we're talking about all dev jobs being replaced then we're also talking about most if not all knowledge work being automated, which would probably result in a fundamental restructuring of society. I don't see that happening anytime soon, and if it does happen it's probably impossible to predict or prepare for anyways. Besides maybe storing rations and purchasing property in the wilderness just in case.

finnjohnsen2 · 2026-02-10T10:25:33 1770719133

I agree. They makes me nauseous. The same kind of light nausea as car sickness.

I assume our brains are used to stuff which we dont notice conciously, and reject very mild errors. I've stared at the picture a bit now and the finger holding the baloon is weird. The out of place snowman feels weird. If you follow the background blur around it isnt at the same depth everywehere. Everything that reflects, has reflections that I cant see in the scene.

I dont feel good staring at it now so I had to stop.

jbl0ndie · 2026-02-10T10:36:29 1770719789

Sounds like you're describing the uncanny valley https://en.wikipedia.org/wiki/Uncanny_valley

finnjohnsen2 · 2026-01-18T11:14:04 1768734844

Me too. I just think he needs to pick his battles right now as Greenland is taking so much space.

finnjohnsen2 · 2025-12-15T17:43:26 1765820606

I am learning Unreal Engine for a year. im almost half way through and am making a simple Space Invaders clone which I will publish on steam for free.

finnjohnsen2 · 2025-12-01T18:02:28 1764612148

aye matey