> But then the clean room implementations started showing up. People had taken A...

mergesort · 2026-04-02T04:21:05 1775103665

Heya, post author here. I think I was just wrong about this assertion. I got into a discussion with a copyright lawyer over on Bluesky[^1] after I wrote this and came away reasonably convinced that this wouldn’t be a valid example of a clean room implementation.

[^1]: https://bsky.app/profile/mergesort.me/post/3mihhaliils2y

aeternum · 2026-04-02T05:06:15 1775106375

The most fitting method would be to be to train an LLM on the Claude Code source-code (among other data).

Then use Anthropic's own argument that LLM output is original work and thus not subject to copyright.

recursive · 2026-04-02T03:47:44 1775101664

I think it means you write a spec from the implementation. Then you write a new implementation from the spec. You might go so far as to do the second part in a "clean" room.

m132 · 2026-04-02T04:46:49 1775105209

Heh, the original being entirely vibed had me thinking of an interesting problem: if you used the same model to generate a specification, then reset the state and passed that specification back to it for implementation, the resulting code would by design be very close to the original. With enough luck (or engineering), you could even get the same exact files in some cases.

Does this still count as clean-room? Or what if the model wasn't the same exact one, but one trained the same way on the same input material, which Anthropic never owned?

This is going to be a decade of very interesting, and probably often hypocritical lawsuits.

roywiggins · 2026-04-02T03:51:29 1775101889

right. that's not what people are doing here though, at all

john_strinlai · 2026-04-02T03:52:34 1775101954

in a typical clean-room design, the person writing the new implementation is not supposed to have any knowledge of the original, they should only have knowledge of the specification.

if one person writes the spec from the implementation, and then also writes the new implementation, it is not clean-room design.

post_below · 2026-04-02T03:58:35 1775102315

I believe the argument is that LLMs are stateless. So if the session writing the code isn't the same session that wrote the spec, it's effectively a clean room implementation.

There are other details of course (is the old code in the training data?) but I'm not trying to weigh in on the argument one way or the other.