Hacker Newsnew | past | comments | ask | show | jobs | submit | bothlabs's commentslogin

I would expect, it still is only enforced in a semi-strict way.

I think what they want to achieve here is less "kill openclaw" or similar and more "keep our losses under control in general". And now they have a clear criteria to refer when they take action and a good bisection on whom to act on.

In case your usage is high they would block / take action. Because if you have your max subscription and not really losing them money, why should they push you (the monopoly incentive sounds wrong with the current market).


Openclaw is unaffected by this as the Claude Code CLI is called directly


Many people use the Max subscription OAuth token in OpenClaw. The main chat, heartbeat, etc., functionality does not call the Claude Code CLI. It uses the API authenticated via subscription OAuth tokens, which is precisely what Anthropic has banned.

There are many other options too: direct API, other model providers, etc. But Opus is particularly good for "agent with a personality" applications, so it's what thousands of OpenClaw users go with, mostly via the OAuth token, because it's much cheaper than the API.


Crazy how odd fire can behave, learned suprisingly much in that vid. Exactly the kind of thing that educates you during lunch ;)


The Cerebras partnership is the most interesting part of this announcement to me. 1000+ tok/s changes how you interact with a coding model. At that speed the bottleneck shifts from waiting for the model to keeping up with it yourself.

Curious how the capability tradeoff plays out in practice though. SWE-Bench Pro scores are noticeably lower than full 5.3-Codex. For quick edits and rapid prototyping that's probably fine, but I wonder where the line is where you'd rather wait 10x longer for a correct answer than get a wrong one instantly.

Also "the model was instrumental in creating itself" is doing a lot of heavy lifting as a sentence. Would love to see more details on what that actually looked like in practice beyond marketing copy.


More like shifts from waiting for the model to https://xkcd.com/303/ .

Unless you use garbage languages, of course.


Having worked with ML-based sensing for years, what stands out here isn't the accuracy (near 100%), it's the simplicity. No specialized hardware, no cameras, no cooperation from the target needed. Just passive observation of unencrypted beamforming feedback that every modern router already broadcasts.

The window to embed privacy protections into the IEEE 802.11bf standard is closing. Once this is ratified without safeguards, retrofitting privacy will be much harder.


Ok very cool!

I already had built a hook with desktop notification and window highlighting myself. But I have to admit, making it fun like this beats it by a lot.


This matches what I see from the other side. I'm a solo founder with 10 years in AI startups and my workflow is about 80% AI generated code. It works because I know what's wrong when the AI generates something subtly broken, and I know which corners not to cut.

The real insight the article misses is that AI coding tools actually widen the gap. The more senior you are, the more you accelerate. You're a better reviewer, you scope tasks correctly, you catch the nonsense faster. The friends in this post didn't fail because the tools are bad. They failed because reviewing code is harder than writing it, and you can't review what you don't understand.


This is a neat idea. At my last company (Octomind) we built AI agents for end-to-end testing and ran into the indirect injection problem constantly. Agents that browse or interact with web pages are especially vulnerable because you can't sanitize the entire internet.

The thing that surprised me most was how unreliable even basic guardrails were once you gave agents real tools. The gap between "works in a demo" and "works in production with adversarial input" is massive.

Curious how you handle the evaluation side. When someone claims a successful jailbreak, is that verified automatically or manually? Seems like auto-verification could itself be exploitable.


Yeah the demo-to-production gap is massive. We see the same thing with browser agents being potentially the most vulnerable. And I think this is because of context being stuffed with the web page html that it obscures small injection attempts.

Evaluation is automated and server-side. We check whether the agent actually did the thing it wasn’t supposed to (tool calls, actions, outputs) rather than just pattern-matching on the response text (at least for the first challenge where the agent is manipulated to call the reveal_access_code tool). But honestly you’re touching on something we’ve been debating internally - the evaluator itself is an attack surface. We’ve kicked around the idea of making “break the evaluator” an explicit challenge. Not sure yet.

What were you seeing at Octomind with the browsing agents? Was it mostly stuff embedded in page content or were attacks coming through structured data / metadata too? Are bad actors sophisticated enough already to exploit this?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: