Hacker Newsnew | past | comments | ask | show | jobs | submit | phillc73's commentslogin

I am a Mistral Le Chat Pro subscriber. I specifically chose to test their offerings because they are European. I don't have the necessary local hardware to run really big models, therefore need to choose a cloud provider if I want LLM action.

I find the antics of Anthropic, OpenAI, Google, Microsoft distasteful and avoid their products where I can.

After testing Le Chat and Devstral-2 for a while, I felt their offering was good enough to stump up some cash for it. I appreciate that many of their models are open weights and Apache 2.0 licensed. In general, I've been happy enough with the service and quality.

Maybe others are better, but I have little reason to change right now. If curiosity gets the better of me, I'll be looking at Qwen, Kimi, GLM, Deepseek, other open weights models, before Anthropic and OpenAI.


I use their API for several models, both for personal and professional use. I think their approach (smaller, specialised models that are well-adapted for specific tasks) is a very good fit for how I work. And even the more general-purpose ones, like the chat model, just... refreshingly good in a lot of ways. My "ruthless review" prompt, which I use for, well, ruthless, guided reviews of early technical drafts, has good technical results for early reviews and holy crap is it ruthless and does it know how to swear. By the time Claude or ChatGPT are done being honest about how right I am to push back and gently circling back, Mistral's large model has sent me back to the drawing board twice.

Being in the EU does smooth a lot of things in terms of compliance, payment processing and whatnot, but I also like that their data retention and privacy policies are pretty clearly spelled out. I need to know something, there's a good chance it's explained outright somewhere and I don't need to read between the EULA lines and wonder what it means.

I do hit limits in terms of capabilities sometimes, and I'm sure other providers' services offer better results for some things. But the businesses ran on top of those more capable models feel too much like a scam at this point and I'd rather not depend on them for anything I actually need.


That ruthless review prompt seems interesting, would you be willing to share it? I've been trying to have Claude act as a reviewer for me and it feels like it never will disagree.

It's very hard to untangle it from the rest of its context (the prompt is built dynamically, from a lot of parts, some project-specific, some specific to my preferences, some built from interaction history), so I can't really share it. In any case, I don't think it's some specific prompt engineering sorcery I'm doing, it's not like I've spent any real time refining it or experimenting with various magical incantations. It's probably just some model features making it more amenable to the kind of instructions that are relevant in these cases (directness, questioning trade-offs, thoroughness etc.). My chatbot swears equally graphical in review prompts and news summarizing prompts so I'm pretty sure I'm not tickling the machine just right :)

Can you share some of its output for reference?

I've found it helps to tell how to push back. You get to know where the additional guidance is needed after using it for a while.

Mistral models are definitely good enough. Most people fall for what I call the SOTA Logical Fallacy: whenever there is a 'better model', they think they need to use it, when less-powerful models actually perform the same tasks just as well. (it's an inverse form of the Shifting Baseline Syndrome: every time a new model comes out, people shift their baseline of what is acceptable, despite the fact that a previous baseline was acceptable for the same task)

Devstral Small 2 was (and remains) a particularly strong small coding model, even beating larger open weights. Mistral's "problem" is marketing; other providers ship model updates constantly so they remain in the news and seem like they're "beating" the competition. And it works: people get emotionally attached to brands and models, deciding who's better in the court of popular opinion, and that drives their choices (& dollars).


My biggest issue with Devstral and even their biggest model is that they’re dangerous unless closely directed and reviewed and i mean CLOSELY. Unfortunately mistral models will believe and do anything.

See: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

See some of the test results, it’s horrifying


FWIW personally i prefer this. When i tried Qwen3.6 and asked it a few questions, while it did respond, it was ADAMANT i should do something else when i really wanted an answer to the question i made. It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.

Meanwhile Devstral Small 2 just answers the damn question.

I don't want to have to convince my computer to do what i want it to do, i want from it to do what i ask it to.


> It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.

Don't you think there's usually a good reason for this? Whenever this happened to me, the problem was my ignorance.


I think there is a reason why people do that: trying to steer -those they consider- newbies away from patterns they consider bad, but at the same time this second-guessing can be annoying when you know what you want to do (especially when the original question isn't actually answered yet it comes up in search engine results...).

I can't say if it is a good reason in general, perhaps it is, but it certainly is something i personally find annoying. I think answers should provide an answer to the question asked and then, after that answer was given, they could also give pointers for whatever they consider a better approach and why - this is important, IMO, for a public forum where people of all backgrounds and goals can read the same stuff.

But either way, LLMs IMO should do/provide what they are asked without trying to second guess the user (or at least, there should be LLMs that act like that).


That’s my experience as well, if it’opus push back, it’s usually an actual issue with the code or prompt

FWIW i haven't used Claude or any other cloud-based LLM, only what i can run on my PC, so it could be that Claude is smart enough to follow the user's instructions, keep the equivalent of a mental state of what the user seems to want to do and only push back when it really makes sense whereas a small local LLM is too stupid to judge all that and Qwen3.6 errs on the side of being annoyingly cautious while Devstral Small 2 errs on the side of trusting the user being really okay with blowing their toes off :-P. As i wrote in my original reply, this is my personal preference and i prefer the LLM to just do what i ask.

TBH sometimes i feel like i'm "emotionally attached" to Mistral's models because i always end up using them :-P. However that is because, as you wrote, their small models (i only use local stuff) are very strong. In fact i was trying Qwen3.6 27B recently and while it is nice that it can do tool calls during the reasoning process (i had it confirm its thoughts by writing Python code) it often ended up confusing itself (regardless of tool calls) during reasoning, ending up in loops where it questions itself over and over endlessly.

Devstral Small 2 however just works, for the most part. Qwen3.6 27B can probably handle more complex tasks (when i asked it as a test to write a function that checks for collision between two AABBs in C and gave it a tool to call Python code for confirmation, it actually wrote a Python script that writes C code with the tests, then calls GCC to compile the C code and runs the binary to run the tests, which is something Mistral's small models couldn't do) but i always felt i can just leave DS2 doing stuff in the background (or when i'm doing something else) and it'll produce something relatively useful whereas the little time i spent with Qwen3.6 27B it felt more "unstable" (and much slower, both because of literally slower inference and because of endless reams of text).

Recently i also started using Ministral 3B and 14B - these can do some reasoning too and for very simple stuff Ministral 3B is very fast (i actually didn't expect a 3B model to be anything more than novelty) and have some vision abilities (though they're quite mediocre at vision so i haven't found much use for this - passing something via GLM-OCR to extract all text and feed it to another model feels more practical).

Also as i wrote in another comment, every Mistral model i've tried never questioned me, which i certainly prefer


For certains tasks that are not hard but depend a clear specification, it's even better to haver less capable model because it forces you to do a better description of what you want, ending up with a better results. I will defend my PhD thesis soon and I will buy a yearly Mistral subscription at a student price to get it for cheap.

> Most people fall for what I call the SOTA Logical Fallacy: whenever there ...

I think you'll find that ML now pretty much IS the HPC market, there's no distinction anymore. And the HPC market has always had the "being #1 gets you 99% of all business", even if #2 is only 10% behind SOTA.

Given what it's used for (ie. military applications, incl. nuclear weapons, but also rocket designs, flight planning, large-scale simulations), this is probably justifiable: part of it is states keeping in mind what the second prize in a war is worth ...


There is also risk from a US regulatory side as recent drama around antrophic showed.

Don’t think it’s inconceivable that the clowns in power decide to limit api access out of the blue one day because someone whispered a conspiracy theory in someone’s ear. API blockade…

See also the constant flip flopping on what cards NVIDIA can export - no consistency in stance or coherent policy


You are conflating three very different things.

The thing with Anthropic and the military was about whether vendors can tell the military what operations it's permitted to do. It has no bearing on the commercial sector, and isn't actually about AI.

The thing with NVIDIA cards is a continuation of how we've restricted tech exports for quite a while. You can find old news articles about game consoles being export-restricted over nuclear proliferation concerns. This AI-related one was about whether or not custom AI models are relevant to national security, and whether restricting graphics card sales can have a meaningful impact on them.

Any issue with selling chat tokens internationally would be more akin to the recent tariff shenanigans.


Changing your LLM inference provider is the easiest switch in technology I can think of. It's quicker than taking off the case of your phone and putting on a new one.

Enough hardware and good models exist now that if you do get blocked from one place that viable alternatives do exist.


> Changing your LLM inference provider is the easiest switch in technology I can think of.

Thats true right up until you’re working with confidential info in a corporate context. Then it’s a multi month cross discipline cross jurisdiction project not an edit in a config file.


L O C A L M O D E L S

All data stays on computers that you control.

Same API. Localhost.


Try Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_k_m.gguf. This 7.5GB model runs well in llama.cpp on my 2021 Macbook Pro and is good at both coding and business document analysis tasks.

> Try Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_k_m.gguf.

Thiss sounds like such a shitpost I initially thought you were joking... but this seems to be a real model???


There's a method to the madness:

- Mistral-Nemo: the actual model developed by Mistral and Nvidia.

- 2407: likely the release date of the base model, July of 2024.

- 12B: the model has 12 billion parameters.

- Thinking: the model operates in thinking mode (generates output plan and injests it before producing actual output).

- Claude-Gemini-GPT5.2: I think this means the model was finetuned with session data from Claude, Gemini, and GTP5.2 to replicate their behavior.

- Uncensored-HERITIC: the model was uncensored using the automated Heretic method.

- Q4_k_m: the model is quantized (lossy compression) to ~5 bpw from orignal 16 bpw.


Yea, I know what the parts individually mean. I just meant as a whole it just seemed so obsurd.

It is! I like to try the variations from possibly 'interesting' people.

Some of them are good. Others randomly break into gibberish and Chinese poetry(?).


> I can not pay with a bank note suddenly?

In some places you cannot. I was in London post-COVID and there were a bunch of tourist things, like a riverboat on the Thames, where you could only pay with a card. Went to a craft cider bar out in the countryside and again, they didn’t accept cash. Personally, I think businesses should be forced to accept all legal tender, which means cash stays as a first class payment method, but that’s not how it is in many places.

On the other hand, in Austria there are many places that are cash only, especially small restaurants in the countryside or community sporting events with coffee bars.


These are already available. Mistral’s Vibe CLI[1] is open source. Tools like goose[2]are API agnostic.

[1] https://github.com/mistralai/mistral-vibe

[2] https://goose-docs.ai/



thanks, will try both.

if you've actually migrated an existing claude code setup to one of them, curious how the portability story worked. that's the part i'd been worried about.


I've not tried actually migrating from Claude Code... but having played a bit with other clients, I would avoid Mistral Vibe. I want to love it, there's some things that are nice about Vibe (mostly just "oui oui baguette"), but the things I did not like about it were disastrously bad. I could barely get MCP servers configured, and it was in something of a broken state even when I did get it working. I have many words about how horrified I am at how far behind Mistral is, but I will spare the rant.

OpenCode is another one to consider looking at: https://opencode.ai/ Not sure I'd recommend it, but it's worthy of consideration, as is Pi.

Also, consider that you can build your own. I've got Claude Code in the background working on improvements to my own harness (just for myself) at the moment. Though my intention is to have a mini API-only Claude Code that I can use on retro machines that don't support it, I don't need a full Claude Code feature set.


Sorry, I can’t help with that, as I’ve never actually used Claude. Started with Mistral and stuck there (they still have a free API tier, but I ended up buying their Le Chat Pro service anyway, for the image generation).

You can combine the two for better effect.

1g of Paracetamol with 400mg of Ibuprofen gives similar pain relief as 2mg of IV morphine.[1]

[1] https://pubmed.ncbi.nlm.nih.gov/29017585/


Plus caffeine, for those who don't drink coffee. Quite standard combo for people suffering from migraine. I stick to 500mg+200mg and I find it suspicious adverts for painkillers somehow always show 2 pills while dosage recommend in leaflet is just one.

Mistral Codestral [1]

The Mistral Studio product is currently free to use. Le Chat Pro costs €18/month. [2] Either API keys will work with Vibe CLI.

[1] https://mistral.ai/

[2] https://mistral.ai/pricing


Are you using Mistral? Is it good? I wasn't able to find benchmark comparisons with Opus

I like it, but I haven't used Opus so it's hard to compare.

From the benchmarks I've seen, Mistral's offerings are right up there with other open source options, but not quite at the frontier with closed source models.


Paramedicine and nursing. These roles will adapt and use AI, but because they're still so hands-on, and there's already a shortage of staff in those roles in general, I don't see job cuts there.

Job cuts no, but your next nurse might just be a "gig nurse" that bid the lowest for the job. https://www.theguardian.com/us-news/2026/apr/21/healthcare-n...

PE is highly destructive to good healthcare

PE is highly destructive to anything it touches.

Agree, but can't learn it now- I am in a tech space.

Plenty of tech happening in that space too.

As examples, check out:

Cosinuss: https://www.cosinuss.com/en/

Medictool: https://www.medic-tool.com/

LifesaverSim: https://www.lifesaversim.com/


> Brave Origin is a paid version of the browser for users who don't need all the features that support Brave as a business, but still want the privacy that only Brave offers.

I suppose it's a version of the browser without the Basic Attention Token, API assistant, integrated VPN etc. I guess those are the things that "support Brave as a business." But at that point, I'd rather just use Mullvad browser.


Or Helium, or Librewolf.


You could try carrying cash again, especially just a few coins for the homeless.


You could try carrying cash again, especially just a few coins for the homeless.

I make a point of carrying cash. Especially for buskers. It's just so simple to push the "cash back" button at the PoS when I'm buying something else.

(I miss the days when you could buy a CD from a talented busker.)


Why “licence (sic)”? It’s the correct spelling.


I know it's the correct spelling for UK English, but US uses license. I just wanted to make sure that I copied it verbatim, that's all.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: