Since Gemma 4 came this easter the gap from self hosting models to Claude has decreased sigificantly I think. The gap is still huge it just that local models were extremely non-competitive before easter. So now it seems Qwen 3.6 is another bump up from Gemma 4 which is exciting if it is so. I keep an Opus close ofcourse, because these local models still wander off in the wrong direction and fails. Something Opus almost never does for me anymore.
But every time a local model gets me by - I feel closer to where I should be; writing code should still be free. Both free as in free beer, and free as in freedom.
My setup is a seperate dedicated Ubuntu machine with RTX 5090. Qwen 3.6:27b uses 29/32gb of vram when its working right this minute. I use Ollama in a non root podman instance. And I use OpenCode as ACP Service for my editor, which I highly recommend. ACP (Agent Client Protocol) is how the world should be in case you were asking, which you didnt :)
Exciting times and thank you Qwen team for making the world a better place in a world of Sam Altmans.
>> I feel closer to where I should be; writing code should still be free. Both free as in free beer, and free as in freedom.
I’m just pleased by the competition, agree with the ideal of free and local but sustainable competition is key: driving $200 p/m down to a much much lower number.
I use Qwen 3.5 122B on an RTX PRO 6000 with open code, and very pleased. I don't feel a need for using a closed model any more. The result after answering questions in Plan mode is almost always what I want, with very few occasional bugs. It does a lot of effort to see how the code I am working on is written now while extending it in the same style.
If they release a Qwen 3.6 that also makes good use of the card, may move to it.
There was a qwen-3.6 MoE six days ago that I thought was better than Gemma 4. Today's is a dense model. (gemma release both a 26B MoE and a 31B dense at the same time)
I have intention to evaluate all four on some evals I have, as long as I don't get squirrelled again.
What level of programming tasks can a 27B model handle? Even with Claude, I'm occasionally not satisfied, and I can't imagine how effective a 27B model would be.
I ran 3 prompts (short versions, full version in the repo):
- Implement a numerically stable backward pass for layer normalization from scratch in NumPy.
- Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).
- Implement an efficient KV-cache system for autoregressive transformer inference from scratch.
and tested Qwen3.6-27B (IQ4_NL on a 3090) against MiniMax-M2.7 and GLM-5 with kimi k2.6 as the judge (imperfect, i know, it was 2AM). Qwen surpassed minimax and won 2/3 of the implementations again GLM-5 according to kimi k2.6, which still sounds insane to me. The env was a pi-mono with basic tools + a websearch tool pointing to my searxng (i dont think any of the models used it), with a slightly customized shorter system prompt. TurboQuant was at 4bit during all qwen tests.
Full results https://github.com/sleepyeldrazi/llm_programming_tests.
Needless to say those tests are non-exhaustive and have flaws, but the trend from the official benchmarks looks like is being confirmed in my testing. If only it were a little faster on my 3090, we'll see how it performs once a DFlash for it drops.
Basic triage is good. I've found I need to mostly handle programming, but local models have been good for pointing me at where to look with just "investigate https://github.com/HarbourMasters/Shipwright/issues/6232" as prompt
Can't answer for an RTX 5090, but for an RTX 5080 16GB of RAM (desktop), I get about 6 tokens/sec after some tweaking (f16->q4_0). Kind of on the borderline of usable.. probably realistically need either a 5090 with more RAM or something like a Mac with a unified memory architecture.
A Mac is not going to be all that much faster than a 5080 with any models, other than the ones you can’t currently run at all because you don’t have enough GPU+CPU memory combined.
You’re much better off adding a second GPU if you’ve already got a PC you’re using.
Geopolitics is a reason. Many individuals and companies are scrambling for safe alternatives to US tech. I live in Norway and there is a lot of this going on.
I like this. This is an accurate state of AI at this very moment for me. The LLM is (just) a tool which is making me "amplified" for coding and certain tasks.
I will worry about developers being completely replaced when I see something resembling it. Enough people worry about that (or say it to amp stock prices) -- and they like to tell everyone about this future too. I just don't see it.
Amplified means more work done by fewer people. It doesn’t need to replace a single entire functional human being to do things like kill the demand for labor in dev, which in turn, will kill salaries.
I would disagree. Amplified meens me and you get more s** done.
Unless there a limited amount of software we need to produce per year globally to keep everyone happy, then nobody wants more -- and we happen to be at that point right NOW this second.
I think not. We can make more (in less time) and people will get more. This is the mental "glass half full" approach I think. Why not take this mental route instead? We don't know the future anyway.
In fact, there isn’t infinite demand for software. Especially not for all kinds of software.
And if corporate wealth means people get paid more, why are companies that are making more money than ever laying off so many people? Wouldn’t they just be happy to use them to meet the inexhaustible demand for software?
I do wonder though if we have about enough (or too much) software.
I hear people complaining about software being forced on them to do things they did just fine without software before, than people complaining about software they want that doesn’t exist.
Yeah I think being annoyed by software is far more prevalent than wishing for more software. That said, I think there is still a lot of room for software growth as long as it's solving real problems and doesn't get in people's way. What I'm not sure about is what will the net effect of AI be overall when the dust settles.
On one hand it is very empowering to individuals, and many of those individuals will be able to achieve grander visions with less compromise and design-by-committee. On the other hand, it also enables an unprecedented level of slop that will certainly dilute the quality of software overall. What will be the dominant effect?
It is a 19th century economic observation around the use of coal.
It is like saying the PDF is going to be good for librarian jobs because people will read more. It is stupid. It completely breaks down because of substitution.
Farming is the most obvious comparison to me in this. Yes, there will be more food than ever before, the farmer that survives will be better off than before by a lot but to believe the automation of farming tasks by machines leads to more farm jobs is completely absurd.
That’s not basic economics. Basic economics says that salaries are determined by the demand for labor vs the supply of labor. With more efficiency, each worker does more labor, so you need fewer people to accomplish the same thing. So unless the demand for their product increases around the same rate as productivity increases, companies will employ fewer people. Since the market for products is not infinite, you only need as much labor as you require to meet the demand for your product.
Companies that are doing better than ever are laying people off by the shipload, not giving people raises for a job well done.
Like denying that more efficiency without a commensurate increase in product demand means the demand for labor goes down, which means fewer jobs, and lower salaries? You don’t pay people what they’re actually worth, you pay people what they’ll work for. Requesting more money because you’re making the company more money is only viable if there aren’t qualified people lining up for the chance to take your role. Even without more money, well-paid people tend to regrettably get laid off in those circumstances.
Tell me, when was the last time you visited your shoe cobbler? How about your travel agent? Have you chatted with your phone operator recently?
The lump labour fallacy says it's a fallacy that automation reduces the net amount of human labor, importantly, across all industries. It does not say that automation won't eliminate or reduce jobs in specific industries.
It's an argument that jobs lost to automation aren't a big deal because there's always work somewhere else but not necessarily in the job that was automated away.
Jobs are replaced when new technology is able to produce an equivalent or better product that meets the demand, cheaper, faster, more reliably, etc. There is no evidence that the current generation of "AI" tools can do that for software.
There is a whole lot of marketing propping up the valuations of "AI" companies, a large influx of new users pumping out supremely shoddy software, and a split in a minority of users who either report a boost in productivity or little to no practical benefits from using these tools. The result of all this momentum is arguably net negative for the industry and the world.
This is in no way comparable to changes in the footwear, travel, and telecom industries.
Current generation "AI" has already largely solved cheaper, faster, and more reliable. But it hasn't figured out how to curb demand. So far, the more software we build, the more people want even more software. Much like is told in the lump of labor fallacy, it appears that there is no end to finding productive uses for software. And certainly that has been the "common wisdom" for at least the last couple of decades; that whole "software is eating the world" thing.
What changed in the last month that has you thinking that a demand wall is a real possibility?
I agree the pie can grow, but I don’t know that the profession survives in its current form. Whether the next form is personally profitable for those of us who’ve sunk a decade+ into the SWE skillset remains to be seen.
I selfishly hope it is, but imo it’s simply to early to tell.
This implication completely depends on the elasticity (or lack thereof) of demand for software. When marginal profit from additional output exceeds labor cost savings, firms expand rather than shrink.
We lost the pneumatic tube [1] maintenance crew. Secretarial work nearly went away. A huge number of bookkeepers in the banking industry lost their jobs. The job a typist was eliminated/merged into everyone else's job. The job of a "computer" (someone that does computations) was eliminated.
What we ended up with was primarily a bunch of customer service, marketing, and sales workers.
There was never a "office worker" job. But there were a lot of jobs under the umbrella of "office work" that were fundamentally changed and, crucially, your experience in those fields didn't necessarily translate over to the new jobs created.
Right, and my point is that specific jobs, like the job of a dev, were eliminate or significantly curtailed.
New jobs may be waiting for us on the other side of this, but my job, the job of a dev, is specifically under threat with no guarantee that the experience I gained as a dev will translate into a new market.
I think as a dev if you're just gluing API's together or something akin to that, similar to the office jobs that got replaced, you might be in trouble, but tbh we should have automated that stuff before we got AI. It's kind of a shame it may be automated by something not deterministic tho.
But like, if we're talking about all dev jobs being replaced then we're also talking about most if not all knowledge work being automated, which would probably result in a fundamental restructuring of society. I don't see that happening anytime soon, and if it does happen it's probably impossible to predict or prepare for anyways. Besides maybe storing rations and purchasing property in the wilderness just in case.
I agree. They makes me nauseous. The same kind of light nausea as car sickness.
I assume our brains are used to stuff which we dont notice conciously, and reject very mild errors. I've stared at the picture a bit now and the finger holding the baloon is weird. The out of place snowman feels weird. If you follow the background blur around it isnt at the same depth everywehere. Everything that reflects, has reflections that I cant see in the scene.
I dont feel good staring at it now so I had to stop.
But every time a local model gets me by - I feel closer to where I should be; writing code should still be free. Both free as in free beer, and free as in freedom.
My setup is a seperate dedicated Ubuntu machine with RTX 5090. Qwen 3.6:27b uses 29/32gb of vram when its working right this minute. I use Ollama in a non root podman instance. And I use OpenCode as ACP Service for my editor, which I highly recommend. ACP (Agent Client Protocol) is how the world should be in case you were asking, which you didnt :)
Exciting times and thank you Qwen team for making the world a better place in a world of Sam Altmans.
reply