More

simonreiff · 2026-05-10T04:39:43 1778387983

I agree with explicit parentheses but please be careful about assuming associativity! The risk when handling floating-point arithmetic in particular is that associativity breaks, and suddenly a + (b + c) does NOT equal (a + b) + c. Not only can these lead to unexpected and hard-to-trace failure patterns, but depending on the details, they also can introduce memory overflow/underflow vulnerabilities.

chuckadams · 2026-05-10T14:25:04 1778423104

If you're going for bit-for-bit equivalence of float values, then even with a single operation you're relying on compiler flags, architecture, the phase of the moon... I'm hard-pressed to think of any memory safety issues though.

Groxx · 2026-05-10T17:04:40 1778432680

Yea, you're in a fairly special niche of programming if you're somewhere that truly matters, and you can't accept any valid order's output. In most general code, if that kind of precision matters, float is the wrong choice: use a bignum object and be exactly correct regardless of how you organized your code.

Which is a niche that exists, obviously. So it is absolutely true for some cases. But I would hope that any code that requires this is extremely clear about requiring it.

simonreiff · 2026-05-09T01:42:43 1778290963

What a great write-up. Thanks for sharing how you found this fascinating vulnerability and exploit.

simonreiff · 2026-05-08T09:52:00 1778233920

I'm sure you're right. Across tens (hundreds?) of thousands of institutions worldwide, each one is exercising its well-written incident runbook that not only gets updated regularly but also is rehearsed constantly, just in case something like this happens. After all, what university IT department DOESN'T prepare obsessively for the moment when they need to restore all grades on all assignments for all courses from backup and fall over to the backup system for final exam administration in any required format specified by any professor, in the second week of May, on a non-negotiable schedule? There's absolutely nothing to worry about here.

brookst · 2026-05-08T13:19:30 1778246370

Yep. Thank God we fund school IT so generously, so everyone from Harvard to small state colleges has an absolute top notch IT department, dedicated to best practices, fully resourced to do BC/DR planning and dry runs. This could be a real catastrophe if any schools were under-resourced.

simonreiff · 2026-05-06T14:38:54 1778078334

I'm Simon, an attorney and partner at a boutique law firm in New York City, where I have been representing clients in high-stakes commercial and real estate disputes for almost 20 years. I've also been building software for many years, long before AI assistants existed, though these days, like most of you, I use AI coding agents regularly to boost productivity.

Last year, I hit a wall: I simply could not get my AI assistants to follow my instructions to edit my files accurately and reliably. I found this to be true regardless which model or client I chose, no matter how well I documented, and across all file types.

I began focusing on a specific scenario: The agent echoes back a perfectly reasonable plan to revise a file, they call their tools, and they announce completion; but, the file is corrupted. Even if not literally broken, the diff shows wholesale replacements, many unrelated to the underlying issue, when small, surgical modifications were warranted. I call this last-mile failure pattern "Execution Slop".

After investigation, I concluded that Execution Slop cannot be fixed through prompt engineering or by paying for more expensive tokens, because the AI file-editing tools themselves are broken. All major AI coding assistants use the same string-replacement strategy for editing under the hood. Agents can't visualize their changes before committing them to disk, get no warning that they're about to break something, can't roll back changes atomically if they realize they made a mistake, and often can't even insert or delete at a specific line or line range (let alone a particular column), without also echoing everything around it.

So I spent nearly a year building something completely different. HIC Mouse gives AI agents line- and coordinate-based editing through a natural syntax that allows agents to edit concisely by declaring region boundaries instead of forcing them to use string replacement. All multi-operation and large operations are automatically staged in memory before touching disk, triggering a Dialog Box mode, in which the agent can save, cancel, inspect, or refine. If something goes wrong, the agent can roll back edits atomically. If most of a batch succeeds but one operation fails, the agent can fix just the failure without discarding the rest. And agents are given embedded contextual guidance at every tool call.

To validate rigorously that HIC Mouse genuinely improves outcomes, I ran three preregistered confirmatory studies (N=67 paired runs) comparing Mouse-enabled AI assistants running in isolated Docker containers performing timed, realistic file-editing tasks ranging in difficulty, against identically configured agents using built-in editing tools. I've uploaded the technical report and statistical analysis with all the details, but the bottom line is that Mouse dramatically improved performance (Cohen's h > 2 or "massive" effect size on multiple metrics), across every dimension that I studied -- capability, speed, cost, reliability, and most importantly, accuracy.

We have now officially launched, and HIC Mouse is available for download through the VS Code Marketplace and Open VSX. Mouse works with VS Code, Cursor, and Kiro, and it's compatible with GitHub Copilot, Claude Code, and other MCP clients.

Please consider installing HIC Mouse, and let me know what you think! I really hope that it genuinely makes a positive difference for you.

Now let me ask you: How often do you encounter Execution Slop, and what are you doing to avoid it?

simonreiff · 2026-05-02T18:28:56 1777746536

Interestingly, many people will refer to zugzwang when one player only has losing moves and would love to skip their turn altogether, but that's not zugzwang. As a non-example of zugzwang, consider the position with White having a Kb6 and Rc6, and Black just has Kb8. When White moves 1. Rc5, killing a move, Black has no choice but to move 1...Ka8 followed by 2. Rc8#. However, Black is not in zugzwang, because the position is not mutually bad for either player. As a true example of zugzwang, consider the example where White has a Kf5, pawn on e4, Black has a Kd4 and pawn on e5. Now this position is zugzwang because whichever player has to make the next move loses defense of their pawn and with it, the game. For instance, if it's White to move, the game could continue 1. Kf6 Ke4 2. Kg5 Kf3 3. Kf5 e4 and Black will simply march his e-pawn to the 1st rank, promote to a Queen, and checkmate shortly after.

T0Bi · 2026-05-02T18:36:58 1777747018

Wikipedia disagrees:

"There are three types of chess positions: either none, one, or both of the players would be at a disadvantage if it were their turn to move. The great majority of positions are of the first type. In chess literature, most writers call positions of the second type zugzwang, and the third type reciprocal zugzwang or mutual zugzwang. "

You're talking about mutual zugzwang

simonreiff · 2026-05-02T20:10:05 1777752605

The Wikipedia article goes on to say that other authors describe the second type as a "squeeze" -- I think Kemp uses that term -- and only the mutual or reciprocal kind as a true "zugzwang". I can't remember if it was GM Edmar Mednis or IM Rafael Klovsky who told me many years ago that it's only the mutual scenario that qualifies as a "true" zugzwang, but I'm pretty sure it was one or both of them. Either way, the subject has divided chess authors almost since inception of the term in the first place. You can see the Wikipedia article on Immortal Zugzwang, for instance, which is one of the earliest famous examples of "zugzwang" and is featured in Nimzovitch's classic treatise "My System", and at the same time, many other famous players like IM Andy Soltis and others disagreed with the use of the term for that game.

A great article with some really beautiful examples of zugzwang is: https://www.chesshistory.com/winter/extra/zugzwang.html. There's a very nice discussion at the end as well of a disagreement along just these lines as to what truly constitutes zugzwang, between Hooper and Myers.

simonreiff · 2026-04-27T20:01:08 1777320068

I believe Groucho Marx once said: "I'm a man of principles. If you don't like them, I have others!"

simonreiff · 2026-04-17T21:33:21 1776461601

Hey this is pretty neat! I definitely would try using this for benchmarks and other places where I need strong isolation as Docker is just too bloated and slow, but sadly I don't think I can run this natively on my Windows laptop. I hope you extend to WSL! Good luck and congrats on launch.

fqiao · 2026-04-17T21:42:37 1776462157

Hey thanks so much for the feedback. Yah try it and let us know. We have a discord if you want to join, but either github or discord feel free to report any issues you find to us.

Cheers!

jaytheseveloper · 2026-04-19T07:28:17 1776583697

If you're using wsl2, /dev/kvm should be available. You may need to install a couple extra packages to enable kvm etc.

simonreiff · 2026-04-17T21:22:00 1776460920

Antiquity slop

simonreiff · 2026-04-17T19:28:14 1776454094

Tailwind is fantastic precisely because the biggest benefit (tree-shaking to minimize the CSS that ships) massively outweighs the fact that Tailwind syntax "looks like" an anti-pattern and makes your code "look" ugly. Also, you get used to bundling your styling and JS code in one place with any component-driven framework like Next.js/React, and Tailwind works seamlessly with all of them. I guess I just prefer the benefits to the alternative, and I feel like the collateral damage of the alternative is definitely not worth trying to make front-end design code look simpler.

iknowstuff · 2026-04-17T20:55:21 1776459321

style="color:red" is back in fashion?

simonreiff · 2026-04-17T15:25:29 1776439529

I respectfully disagree that Mythos was important because of its findings of zero-day vulnerabilities. The problem is that Mythos apparently can fully EXPLOIT the vulnerabilities found by putting together the actual attack scripts and executing it, often by taking advantage of disparate issues spread across multiple libraries or files. Lots of tools can and do identify plausible attack vectors reliably, including SASTs and AI-assisted analysis. The whole challenge to replicate Mythos, in my view, should focus on determining whether, on the precise conditions of a particular code base and configuration, the alleged vulnerability actually is reachable and can be exploited; and then, not just to evaluate or answer that question of reachability in the abstract, but to build a concrete implementation of a proof of concept demonstrating the vulnerability from end to end. It is my understanding from the Project Glasswing post that the latter is what Mythos is exceptionally good at doing, and it is what distinguishes SASTs and asking AI from the work done up until now only by a handful of cybersecurity experts. Up to this point, the ability to generate an exploit PoC and not just ascertain that one might be possible is generally possible using existing tools but might not be very easy or achievable without a lot of work and oversight by a programmer experienced in cybersecurity exploits. I don't have any reason to doubt the conclusion that GPT-5.4 and Opus 4.6 can spot lots of the same issues that Mythos found. What I think would be genuinely interesting is if GPT-5.4 or Opus 4.6 also could be tested for their ability to generate a proof of concept of the attack. Generally, my experience has been that portions of the attack can be generated by those agents, but putting the whole thing together runs into two hurdles: 1. Guardrails, and 2. Overall difficulty, lack of imagination, lack of capability to implement all the disparate parts, etc. I don't know if Mythos is capable of what is being claimed, but I do think it's important to understand why their claims are so significant. It's definitely NOT the mere ability to find possible exploits.

MattPalmer1086 · 2026-04-17T21:06:26 1776459986

Exactly right. The differentiator is it can create working exploits for about 75% of the issues it found, including chaining together several different ones to achieve quite tricky exploits.