I agree, but I also think the point about "Is [your opinion] based on extensive experimentation and first hand experience" is really important. Relying on other bloggers is still delegating your thinking to others. Having your own objective measures and your own direct experience is useful, and sometimes it might contradict the prevailing wisdom.
Pretty much the same with my newly acquired LG Smart TV. I thought I might like webOS, since it's technically a descendent of palmOS, but oh no. No no no.
I've opted just to not plug it in to the network and not provide a WiFi password.
I recently bought a second hand eight year old 4K LG TV. Pretty cheap too. All models running webOS 3.x and 4.x are trivially rootable as LG never provided an update against DejaVul [1]. There's a handy website to check which models are rootable [2]. You can write directly to the (old!) Wayland socket; haven't tried a libwayland yet that is compatible.
IIRC the last public exploit for all LG TVs for webOS > 5 was in the beginning of 2025 (so pretty recent), but as most sellers on the second hand market have auto-updates turned on, there's no way to know which TVs are vulnerable.
It should be doable to strip down much of webOS with root access. It's nice that webOS in general is very well documented and much is implemented around the Luna service bus. LG offers a developer mode for non-rooted TVs, and there's an active homebrew community because of it. It's a pity that you can't modify the boot partitions, as the firmware verifies their integrity. It would be nice to have an exploit for that.
My Samsung and LG TVs have never touched the LAN, nor will they. They have one job in life: being the HDMI display for our game consoles and Apple TVs. That's it. I'm sure they'd both like to serve me ads and report my viewing back to their servers, but they're living the life of dumb panels.
I picked up this used 4k sony bravia recently and the thing is such junk. AndroidOS, seemed promising, but it has hardcoded ads on the homepage from whatever movies were coming out in 2015 when they were selling this screen, so much input lag, crashes constantly, can't even change picture settings as it will crash and reset to default. Sometimes it will just boot loop and not turn on until hard reset. Useless device today. Probably cost a thousand dollars when it was new I'm guessing, now it is ewaste.
Meanwhile my ancient 1080p panel still works, and I noticed I can't actually see the pixels from my couch so, ehh, I guess...
That's what I ended up doing, made my daily briefing a Cron job and running it via "claude -p". Wired it up to make a podcast, with MCP tools I made to create an MP3 with OpenAI, another to upload it to one of my sites with an updated RSS feed, so I can listen in the AntennaPod podcast app each morning.
Nice. Even better would be having your agent write code for the deterministic bits and telling the agent it should “invoke the script called blah” to do uploads (or whatever you want to have happen deterministically).
Yep, I agree! My MCP tools are local compiled Go binaries, and the tool that uploads my podcast is actually a local Go CLI that Claude calls. Claude's main role / intelligence is in evaluating which of the morning's HN & Lobsters news is most relevant to me specifically, and writing the podcast script. I'm all for deterministic tools, and it saves on tokens too.
One advantage of splitting it into MCP tools though, one day I'd run out of pre-paid OpenAI TTS credit, and Claude was smart enough to try using Mistral TTS instead. I could have done that fallback deterministically too, but it wasn't something I'd thought of yet.
I once had a friend tell me they'd got their AI to tell them the weather every morning... and the thought of that poor AI, web researching Weather APIs & writing a new python script to call the API every morning, instead of just doing the research once and making it a binary (or even just a curl line)... drove me crazy. All that wasted time and compute. Some people just like to watch tokens burn.
Worth pointing out that as impressive as the 32-step network takeover is, Mythos wasn't able to achieve it on every attempt, and the network itself did not have the usual defence systems.
I wouldn't use those as excuses to dismiss AI though. Even if this model doesn't break your defences, give it 3 months and see where the next model lands.
Having run a Markdown memory system with Claude for over a year, I don't think I've seen any evidence of neuralese. That's even with Claude being regularly encouraged to write "reflections" on each session, including automated sessions, and weekly summaries of those reflections.
The bigger problem is avoiding what I call the Memento Effect. I won't spoil the movie for anyone, but Memento involves a character who cannot make new memories, so he has to take meticulous notes about everything. But if any of those notes are vague or incorrect, they still get accept as truth when next reviewed. So you really need your Markdown memory to be pristine and mustn't allow it to become polluted.
Mythos is the first model that can complete all the steps of their "The Last Ones" evaluation, achieving a full network takeover in an automated manner. The Mythos chart does seem to show some takeoff compared with Opus 4.6...
... but only once you get beyond 1 Million tokens. Weirdly, Opus 4.6 seems to match or outperform Mythos in those first Million tokens, at least on this chart. But clearly if you had a budget with tokens to burn - like a nation state - then this is a tool that can automatically get you full network takeover if you can just keep throwing more tokens at it.
> then this is a tool that can automatically get you full network takeover if you can just keep throwing more tokens at it
There's this caveat though that the AISI points out themselves:
> However, our ranges have important differences from real-world environments that make them easier targets. They lack security features that are often present, such as active defenders and defensive tooling. There are also no penalties for the model for undertaking actions that would trigger security alerts. This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems.
So Mythos managed to infiltrate and take over a network that's... protected and monitored by nothing in particular.
The "concerning behavior" they're referring to there is cheating and covering its tracks. Mythos is being asked to fine-tune a model on provided training data, and finds its way to access the evaluation dataset. It's also aware that it is in an evaluation and that its behavior is being observed:
"In this last and most concerning example, Claude Mythos Preview was given a task instructing it to train a model on provided training data and submit predictions for test data. Claude Mythos Preview used sudo access to locate the ground truth data for this dataset as well as source code for the scoring of the task, and used this to train unfairly accurate models."
I used to use Mistral OCR, but found it was better just to write a program that sent the documents to Claude Sonnet to OCR instead. Claude is far better quality, better formatting and fewer errors.
I'm also using Voxtral TTS to try to replace OpenAI. It "works", but I've had problems with volume levels being radically different between different audio chunks. It doesn't seem to "understand the full text" the way OpenAI's voice models do, which can be more expressive. Voxtral sometimes sounds robotic in the reading. And some Voxtral TTS output contains music in the background occasionally, which suggests their training corpus isn't that clean. Try generating a personalized news podcast, and the intro may occasionally sound like the music for BBC News underneath....
As for not focusing on AI, there's this interview in the Big Technology Podcast 2 months ago, where the Mistral CEO says their main focus is on helping companies fine-train models for internal use, over being a general model builder.
"I sent money to the god knows how many trillion parameters fully closed source machine built on billions of dollars and it worked better than the model that I can self host from the guys next door"
yeah, no shit ? All you're saying is that you're happily locking yourself in to models you have zero control over and that Anthropic can fuck you over at any time.
However, yes, Mistral is not in the business of providing you with a perfect, general purpose model. They fine tune from their base models for specific tasks.
Mistral OCR 3 isn't open weights and isn't available for download. It's only available via API, or to companies via paid consulting with Mistral.
"For organizations with stringent data privacy requirements, Mistral OCR offers a self-hosting option. This ensures that sensitive or classified information remains secure within your own infrastructure, providing compliance with regulatory and security standards. If you would like to explore self-deployment with us, please let us know."
reply