Hacker Newsnew | past | comments | ask | show | jobs | submit | davesque's commentslogin

> We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses.

It feels like this is a losing strategy. Claude should be developing secure software and also properly advising on how to do so. The goals of censoring cyber security knowledge and also enabling the development of secure software are fundamentally in conflict. Also, unless all AI vendors take this approach, it's not going to have much of an effect in the world in general. Seems pretty naive of them to see this as a viable strategy. I think they're going to have to give up on this eventually.


The fundamental tension is that the models are getting weirdly good at hacking while still sort of sucking at a bunch of economically valuable tasks.

So they've hit the point where the models are simultaneously too smart (dangerous hacking abilities) and too stupid (can't actually replace most employees). So at this point they need to make the models bigger, but they're already too big.

So the only thing left to do is to make them selectively stupider. I didn't think that would be possible, but it seems like they're already working on that.


models are getting weirdly good at hacking while still sort of sucking at a bunch of economically valuable tasks

like most human hackers


Honestly I feel sometimes like about the only thing they do successfully is hacking. Not just in the sense of breaking into systems that are assumed to be secure although also in that sense. They're just, highly effective at fumbling around with a hatchet until something works. We just happen to have version control and automated testing that generally makes that approach somewhat viable for the task of programming. But while I've been genuinely impressed at how much it can put features into a workable state, I've never been confident looking at its output that it's going to do more than POC quality at the current state of things. But it's pretty dang effective at that given enough time and a space safe to hack away and reset until the product looks close enough.

Yes, it's a losing strategy; no one else is going to do this. They are inviting parties to partner with them, so it's not totally in conflict, but yeah I'm sure there's genuine concern coming out of Anthropic, but I also think as this point they've likely culturally internalized "Dangerous [think: powerful] AI" as a brand narrative.

"The Beware of Mythos!" reads to me as standard Anthropic/Dario copy. Is it more true now than it was before? Sure. Is now the moment that the world's digital infrastructure succumbs to waves of hackers using countless exploits; I doubt it.


Curious how the safeguards work and what impact they will have.

In general I feel that over-engineering safeguards in training comes at a noticeable cost to general intelligence. Like asking someone to solve a problem on a white board in a job interview. In that situation, the stress slices off at least 10% of my IQ.


This is the company that allowed a vibe-release resulting in the leaking the entirety of the Claude Code codebase. What is the bar you're expecting here exactly?

I feel it’s fine as a short term solution, and probably a good thing. Gives the good guys some time to stay on top.

Always remember: a defender must succeed every time , an attacker only once.


Given the list of very large companies in the "glasswing" project - it is likely every competent state actor and criminal organization already has access to Mythos in one way or another. Meanwhile the opensource volunteers responsible for the security of the entire internet don't have access.

Yeah, it is easier to destroy than to create. Models will always be better at hacking than at building.

So then why expect that you're making the world safer by limiting the capability that your vendor locked customers have access to while attackers will go find the best de-censored model that works for them, wherever they can find it?

I'm not a security expert and don't know how to properly audit every github repo that I come across. Maybe I sometimes want to build gnome extensions or cool software projects from source and I want some level of checking along the way for known vulnerabilities. They can't claim this is an obvious win for security when it centralizes rather than democratizes security.

I interpreted their actions as providing time for vendors to protect themselves against the new model proactively, not to nerf the models themselves.

Although perhaps I am naive.


Am I reading the article wrong? It appears that the author did not test the claims of the proof. Wouldn't a "bug" in this case mean she found an input that did not survive a round trip through the compression algorithm?

Update: Actually, I guess this may have been her point: "The two bugs that were found both sat outside the boundary of what the proofs cover." So then I guess the title might be a bit click baity.


Hi! Author here. When we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.

If a buffer overflow causes the system to be exploited and all your bitcoins to be stolen, I don't think the fact that the bug being in the language runtime is going to be much consolation. Especially if the software you were running was advertised as formally verified as free of bugs.

Secondly, I did find a bug in the algorithm. in Archive.lean, in the parsing of the compressed archive headers. That was the crashing input.


> I think it's fair to consider the entire binary a fair target.

Yes, it's still very much a bug. But it has nothing to do with your program being formally verified or not. Formal verification can do nothing about any unverified code you rely on. You would really need a formal verification of every piece of hardware, the operating system, the runtime, and your application code. Short of that, nobody should expect formal verification to ensure there are no bugs.


I read it as that’s also the point. Adding formal verification is not a strict defense against bugs. It is in a way similar to having 100% test coverage and finding bugs in your untested edge cases.

I don’t think the author is attempting to decry formal verification, but I think it a good message in the article everyone should keep in mind that safety is a larger, whole system process and bugs live in the cracks and interfaces.


You're right. It just seems as though it should be self-evident. Especially to those sophisticated enough to understand and employ formal verification.

It does seem that way doesn't it? But as software bugs are becoming easier to find and exploit, I'm expecting more and more people, including those not "sophisticated enough" to understand and employ formal verification to start using it

> I'm expecting more and more people

Then it would help to not introduce any confusion into the ecosystem by using a click-baity title that implies you found a bug which violated the formal specification.


We should not cater to people who make decisions based on titles instead of reading the actual article.

That's a shitty rationale for click-bait titles. Good titles are for the benefit of people who actually read the articles too.

Thanks for responding!

> When we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.

Yeah, I would actually agree. We wouldn't want to advertise that a system is formally verified in some way if that creates a false sense of security. I was just pointing out that, by my reading, the title appears to suggest that the core mechanism of the Lean proof is somehow flawed. When I read the title, I immediately thought, "Oooh. Looks like someone demonstrated a flaw in the proof. Neat." But that's not what is shown in the article. Just feels a bit misleading is all.


OK, so now I see the shadow edit you did for the code source, thanks. Unfortunately, it shows that you are incorrect. For one, the function is a private function and can only be called by local code. Everywhere that the function is called, the size given to it is verified by the program; there is even a note that says it limits the maximum zip file size to avoid a zip bomb. In addition, the code you are quoting isn't even the final code; it is an interim step from what Claude was iterating on. Sucks that this got so much traction, as you are purposely being deceptive in trying to say that this is a bug. You intentionally removed the 'private' keyword in the function signature, as you knew that it would tip off most people to then check when it is actually used.

sorry to hijack the thread. Really cool post. How long did the whole exercise including porting zlib to lean take?

i have a hard real time system that i would love to try this on, but that's a lot of tools to learn and unclear how to model distributed systems in lean.

also, please add rss so i could subscribe to your blog


Lean-zip was not my project but one by others in the lean community. I'm not sure about the methodological details of their process - you might want to check with the original lean-zip authors (https://github.com/kim-em/lean-zip)

I notice you didn't put a code reference for the second bug. Where is the code exactly?

Wouldn't it be more correct to call the article "critical" and not "incendiary"? I looked it over and I don't remember seeing any calls to violence. Altman needs to remember that he holds an incredible amount of power in this moment. He and other current AI tech leaders are effectively sitting on the equivalent of a technological nuclear bomb. Anyone in their right mind would find that threatening.

"Critical" even feels strong. The article was essentially a collection of statements others have made about Sam.

Right, but the picture those statements painted collectively was not flattering. And that was certainly intended by the authors. Thus, critical, but not at all "incendiary."

Update: To clarify, my personal stance is that the critical tone was both intended by the authors and, in my opinion, appropriate given how much power Mr. Altman holds. If he has a history of behaving inconsistently, that deserves daylight.


Are you arguing that because the authors knew the pattern they were documenting was unflattering, the piece is somehow compromised? That they clearly had an agenda? That's called reporting. They called a hundred-plus named sources and the picture those sources independently painted was damning. Altman has a history of telling repeated, easily-checked lies, followed by fresh lies when caught in the first ones.

Are you suggesting that they should have "both sides"-ed by reporting company PR and Sam-friendly sources and giving them equal weight? Sometimes the facts point in one direction.


> Are you arguing that because the authors knew the pattern they were documenting was unflattering, the piece is somehow compromised?

Uh, no? Lol, I'm on your side, bud. Put away the pitchfork. I thought it was a really good and fair article. I am not the adversary you're looking for.


> my personal stance is that the critical tone was both intended by the authors

You may think we are on the same side. You don't understand what side I'm on. "Lol".

Your "personal stance" is that you can get inside the heads of the reporters? Obviously not. So you're going by the idea that an article that leads to critical conclusions is inherently slanted. This is an insidious and damaging idea. It has led to the belief by journalists and editors that they need to twist themselves into pretzels to present "both sides", which is easily exploited by people of bad faith to launder outright lies. There's a direct line between this and authoritarianism. I'm quite serious about this. The fact that you agree with the authors in this case is completely orthogonal.

Jay Rosen has written a lot about this, well worth reading: https://pressthink.org/2010/11/the-view-from-nowhere-questio...


Every article is inherently biased due to the fact that there are inclusions and omissions. This is just a fact.

You're injecting your own personal view into GP's statement by adding a lot of weight into the distinction between the words "critical" and "incendiary" and "neutral", when GP made a very neutral and not as charged statement.


Look if you're looking for a fight just visit a local martial arts gym.

Bud. Put the keyboard down and relax. I have no idea what you're talking about. You've extrapolated all this just from what I wrote?

> You've extrapolated all this just from what I wrote?

says the guy who said "certainly intended by the authors" based on... what they wrote?

On top of that "Put the keyboard down and relax" from the guy who keeps replying?

<chef's kiss>

> I have no idea what you're talking about.

The one point I'll concede!


I love reading stuff like “Critical, slanted, and compromised mean the same thing. They are interchangeable words.”

Given that, it looks like your position on davesque’s posts is slanted. Your take is critical of those posts, which means your assessment is compromised, and as such should not be taken as valid.


And I love seeing sentiments attributed to me, in quotes even, that I didn't state or imply, and certainly don't believe. "Critical" by itself is not a synonym for "slanted". However the post I was commenting on was:

> Right, but the picture those statements painted collectively was not flattering. And that was certainly intended by the authors. Thus, critical, but not at all "incendiary."

The key there is "certainly intended by the authors". The full sentiment here IS equivalent to "slanted".


It is clearly your intent to be critical of davesque’s posts. QED your analysis is compromised

Sam posted a tweet saying "incendiary" was the wrong word choice. https://x.com/sama/status/2042789312400363702

The whole article is about how Sam will say one thing and then deny/opposite later

Anything but unqualified praise and endorsement is egregious harassment.

> Wouldn't it be more correct to call the article "critical" and not "incendiary"?

Sure, but not useful for the overarching aim of equating criticism of the powerful with (stochastic) terrorism.


Yes, the temperature has gone up. And we all know exactly who sits at the top of it all.

I'm honestly completely in favor of this. Anthropic obviously budgets their capacity based on projected human usage patterns coming through their native app suite (Claude Cowork, Claude Code, etc.). They should not be expected to shoulder the burden of community tools like OpenClaw that are effectively designed to strategically max out usage windows on the plan. That has clearly caused issues with uptime in the past couple of months and I've gotten pretty fed up with the degraded service quality while I'm just trying to use Claude Code as it's intended to be used. I'm happy to see they're doing something about this. Seems like a totally fair move to me. I'd rather that Claude Code functions well when I'm using it according to its design.

I wish these AI vendors would quit publishing comparisons with the previous generation of their competitors's models. It's just such a glaringly bad look and no one is fooled by it, even if their achievements deserve praise in their own right. The Qwen models are great and don't deserve the reputational hit that comes from dodgy marketing tactics.


Yeah omlx seems to me like the front runner right now for running MLX models locally in agent workflows (which depend heavily on caching).


I'm not sure this is the issue. I asked Claude Code a simple question yesterday. No sub agents. No web fetches. Relatively small context. Outside of peak hours. Burned 8% of my Max 5x 5hr usage limit. I've never seen anything like this before, even when the cache is cold.


You don't. Most of the time (after the first prompt following a compaction or context clear) the context prefix is cached, and you pay something like 10% of the cost for cached tokens. But your total cost is still roughly the area under a line with positive slope. So increases quadratically with context length.


Yeah, the viz for polar quantization is straight up nonsensical. Okay, so some colors are converted into clocks and then into a bigger box with a pink box inside of it. Got it. Even understanding what polar coordinates are doesn't help you make sense out of it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: