More

mark_l_watson · 2026-04-27T13:07:30 1777295250

Indirectly I use Mistral daily: I like Proton’s Lumo private chat and that runs on Mistral technology. Something like Lumo is very much good enough to replace search and general information browsing and for me is very practical.

What is not so practical is my paying for Gemini Ultra, which has some practicality but is something I pay for because it is fun using strong AIs like Claude and Gemini Pro in AntiGravity. It feels funny to admit paying a lot of money just to have fun with something.

I wish Mistral good luck, and I like their deployed forward engineers approach to business. Seems practical.

jagermo · 2026-04-27T14:53:18 1777301598

I'm looking forward to have a more extensive Kagi API and link that to Lumo. I tried it with Claude and it works pretty well but is too expensive right now. I would love to have more flexibility.

mark_l_watson · 2026-04-26T12:20:30 1777206030

A friend taught me and a few other friends how to play Mahjong early this year. Great game! You do need a skilled instructor and four people to play. This article is good, I wish that I had it five months ago.

mark_l_watson · 2026-04-25T23:19:34 1777159174

So much of the public hates AI, at least the non-tech people I talk with. Good to see so much common sense among the general public.

While I find a Gemini Ultra subscription worthwhile for myself, most of the value is in the fun and entertainment of interacting with a strong API in AntiGravity (usually use Claude models), Gemini App, NotebookLM, etc. It is intellectually interesting and fun.

Can I justify the cost to society for data centers, possibility of US government bailing out the AI tech giants, etc.?

No I can't. I think the Chinese are skunking us. Building cheaper AI is the winning strategy. GLM-5.1 and Deepseek v4 are amazingly effective for much lower inference costs.

mark_l_watson · 2026-04-25T23:11:04 1777158664

I live in Emacs, but I will give Mine a try when get a free hour. I read about Coalton in X and follow the author but I haven't invested time yet to try out.

mark_l_watson · 2026-04-25T13:05:49 1777122349

This is an interesting conversation: some of you correctly pointing out that so much AI driven development is not having a positive effect on business profitability, and others correctly pointing out positive results that give evidence for great future progress.

I have an unusual set of metrics for evaluating AI. I am old and comfortably retired but I still like to experiment with AI tools for updating many of my old (or ancient) open source projects and creating new projects. I am blown away by how good my dedicated Hermes Agent setup on a VPS and also running Google AntiGravity with Claude and Gemini are. Both systems are unbelievably good.

I can only imagine how effective companies with a solid engineering process will be as appropriate roles for human and AI developers solidify. I can also imagine companies with a poor process and poor engineering taste will waste a lot of money.

qwaG · 2026-04-25T13:11:18 1777122678

Where is the software that is produced using the purported efficiency gains and why are you pulling up the ladder behind you?

If you were in AI for such a long time, of course you are biased and want to see it succeed.

Look at what has been written in open source since 2023. Very little. There are no efficiency gains and the incessant talk about prompts and AI just paralyzes the entire field. And people who love to talk have the ears of the managers.

mark_l_watson · 2026-04-24T16:05:17 1777046717

The flash version is smaller, I think around 200B parameters and is cheap to run.

mark_l_watson · 2026-04-24T16:03:19 1777046599

I used the flash version on a tricky Common Lisp coding problem this morning. The first cut of the new library had a runtime error. I was running in a simple REPL using:

ollama run deepseek-v4-flash:cloud

so I had to feed the generated code and the error back into the REPL manually, but it nailed it the second time, and the Common Lisp code was very good.

mark_l_watson · 2026-04-22T17:31:08 1776879068

I have been running the slightly larger 31B model for local coding:

ollama launch claude --model qwen3.6:35b-a3b-nvfp4

This has been optimized for Apple Silicon and runs well on a 32G ram system. Local models are getting better!

yougotwill · 2026-04-22T18:01:12 1776880872

Can I ask how much RAM of the 32GB does it use? For example can I run a browser and VS Code at the same time?

mswphd · 2026-04-22T22:54:37 1776898477

1. the 35B model is a "Mixture of Experts" model. So the earlier commenter's point that it is "larger" does not mean it is more capable. Those types of models only have certain parts of themselves active (for 35b-A3b, it's only 3 billion parameters at a time, vs 27 billion for the model this post is about) at a time to speed up inference. So if you're interested in these things for the first time, Qwen3.6-35B-A3B is a good choice, but it is likely not as capable as the model this thread is about.

2. its hard to cite precise numbers because it depends heavily on configuration choices. For example

2a. on a macbook with 32GB unified memory you'll be fine. I can load a 4 bit quant of Qwen3.6-35B-A3B supporting max context length using ~20GB RAM.

2b. that 20GB ram would not fit on many consumer graphics cards. There are still things you can do ("expert offloading"). On my 3080, I can run that same model, at the same quant, and essentially the same context length. This is despite the 3080 only having ~10GB VRAM, by splitting some of the work with the CPU (roughly).

Layer offloading will cause things to slow down compared to keeping layers fully resident in memory. It can still be fast though. Iirc I've measured my 3080 as having ~55 tok/s, while my M4 pro 48GB has maybe ~70 tok/s? So a slowdown but still usable.

If you want to get your feet wet with this, I'd suggest trying out

* Lmstudio, and * the zed.dev editor

they're both pretty straightforward to setup/pretty respectable. zed.dev gives you very easy configuration to get something akin to claude code (e.g. an agent with tool calling support) in relatively little time. There are many more fancy things you can do, but that pair is along the lines of "setup in ~5 minutes", at least after downloading the applications + model weights (which are likely larger than the applications). This is assuming you're on mac. The same stack still works with nvidia, but requires more finnicky setup to tune the amount of expert offloading to the particular system.

It's plausible you could do something similar with LMstudio + vscode, I'm just less familiar with that.

mark_l_watson · 2026-04-22T12:58:35 1776862715

I have an old Mac Mini with 32G of integrated RAM, and the following works for me for small local code changes:

ollama launch claude --model qwen3.6:35b-a3b-nvfp4

In addition to not having an integrated web search tool, one drawback is that it runs more slowly than using cloud servers. I find myself asking for a code or documentation change, and then spending two minutes on my deck getting fresh air waiting for a slower response. When using a fast cloud service I can be a coding slave, glued to my computer. Still, I like running local when I can!

mark_l_watson · 2026-04-22T12:48:37 1776862117

I am on Google's $20/month plan, and I usually get about three half-hour coding sessions a week with AntiGravity using the Claude models. The limit using Gemini Pro models is much higher. I am retired so Google's $20 plan is sufficient for me, but I understand that people who are still working would need higher limits.

I am also on a $10/month plan with Nous Research for supplying open models for their open source Hermes Agent. I run Hermes inside a container, on a dedicated VPS as a coding agent for complex tasks and so far I find the $10/month plan is enough for about five to ten major tasks a month. I think it is also a good deal.