I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?
The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).
In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!
Note that these are the "chat" system prompts - although it's not mentioned I would assume that Claude Code gets something significantly different, which might have more language about malware refusal (other coding tools would use the API and provide their own prompts).
Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.
The "Picking delaySeconds" section is quite enlightening.
I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.
> The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:
> - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.
> - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.
> *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.
> For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.
> Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.
> The runtime clamps to [60, 3600], so you don't need to clamp yourself.
Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.
They really should have just set the cache window to 5:30 or some other slightly odd number instead of using all those tokens to tell claude not to pick one of the most common timeout values
This is somewhat obvious if you realize that HTTP is a stateless protocol and Anthropic also needs to re-load the entire context every time a new request arrives.
The part that does get cached - attention KVs - is significantly cheaper.
If you read documentation on this, they (and all other LLM providers) make this fairly clear.
For people who spend a significant amount of time understanding how LLMs and the associated harnesses work, sure. For the majority of people who just want to use it, it's not quite so obvious.
The interface strongly suggests that you're having a running conversation. Tool calls are a non-interactive part of that conversation; the agent is still just crunching away to give you an answer. From the user's perspective, the conversation feels less like stateless HTTP where the next paragraph comes from a random server, and more like a stateful websocket where you're still interacting with the original server that retains your conversation in memory as it's working.
Unloading the conversation after 5 minutes idling can make sense to most users, which is why the current complaints in HN threads tend to align with that 1 hour to 5 minute timeout change. But I suspect a significant amount of what's going on is with people who:
* don't realize that tool calls really add up, especially when context windows are larger.
* had things take more than 5 minutes in a single conversation, such as a large context spinning up subagents that are each doing things that then return a response after 5+ minutes. With the more recent claude code changes, you're conditioned to feel like it's 5 minutes of human idle time for the session. They don't warn you that the same 5 minute rule applies to tool calls, and I'd suspect longer-running delegations to subagents.
Unless I'm parsing your reply very badly, I see no world in which anything dealing with HTTP would be more expensive than dealing with kv cache (loading from "cold" storage, deciding which compute unit to load it into, doing the actual computations for the next call, etc).
No, that’s not the issue. What people fail to understand is that every request - eg every message you send, but also tool call responses - require the entire conversation history to be sent, and the LLM providers need to reprocess things.
The attention part of LLMs (that is, for every token, how much their attention is to all other tokens) is cached in a KV cache.
You can imagine that with large context windows, the overhead becomes enormous (attention has exponential complexity).
No, you underestimate how huge the malware problem right now. People try publish fake download landing pages for shell scripts or even Claude code on https://playcode.io every day. They pay for google ads $$$ to be one the top 1 position. How Google ads allow this? They can’t verify every shell script.
No I am not joking. Every time you install something, there is a risk you clicked a wrong page with the absolute same design.
He's not talking about malware awareness. He's talking about a bug i've seen too which requires Claude justifying for *every* tool call to add extra malware-awareness turns. Like every file read of the repo we've been working on
Their marketing is going overtime into selling the image that their models are capable of creating uber sophisticated malware, so every single thing they do from here on out is going to have this fear mongering built in.
Every statement they make, hell even the models themselves are going to be doing this theater of "Ooooh scary uber h4xx0r AI, you can only beat it if you use our Super Giga Pro 40x Plan!!". In a month or two they'll move onto some other thing as they always do.
> spending large amounts of token budget contemplating whether any particular code or task was related to malware development
It almost seems like they are making these models output like a neurotic person.
Soon these high profile models will get caught in analysis paralysis like Chidi in The Good Place.
They will spin around in circles wasting tokens on identifying and mitigating sociological implications while I'm just trying to get it to diagnose a race condition.
I "fixed" this for myself with tweakcc which let's you patch the system prompts. I changed the malware part to just be "watch out for malware" and it's stopped being unaligned.
They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.
The newest versions of the Claude Code package on npm just download the native executables and run that instead. Does tweakcc support that yet? Last time I tried it, there were some pretty huge error messages. For now I've been coping with a pinned version.
Presumably because it has become extremely good at writing software, and if it succeeds at helping someone spread malware, especially one that could use Claude itself (via local user's plans) to self-modify and "stay alive", it would be nearly impossible to put back in the bottle.
That would put itself back in the bottle by running killall to fix a stuck task, or deleting all core logic and replacing it with a to-do to fix a test.
This makes me think it would be nice to see some kinda child of modern transformer architecture and neural ODEs. There was such interesting work a few years ago on how neural ode/pdes could be seen as a sort of continuous limit of layer depth. Maybe models could learn cool stuff if the embeddings were somehow dynamical model solutions or something.
Did your read the paper? Do you have specific criticisms of their problem statement, methodology, or results? There is a growing body of research indicating that in fact, there _is_ a taxonomy of 'hallucinations', that they might have different causes and representations, and that there are technical mitigations which have varying levels of effectiveness.
AI detectors do not work. I have spoken with many people who think that the particular writing style of commercial LLMs (ChatGPT, Gemini, Claude) is the result of some intrinsic characteristic of LLMs - either the data or the architecture.
The belief is that this particular tone of 'voice' (chirpy sycophant), textual structure (bullet lists and verbosity), and vocab ('delve', et al) serves and and will continue to serve as an easy identifier of generated content.
Unfortunately, this is not the case. You can detect only the most obvious cases of the output from these tools. The distinctive presentation of these tools is a very intentional design choice - partly by the construction of the RLHF process, partly through the incentives given to and selection of human feedback agents, and in the case of Claude, partly through direct steering through SA (sparse autoencoder activation manipulation). This is done for mostly obvious reasons: it's inoffensive, 'seems' to be truth-y and informative (qualities selected for in the RLHF process), and doesn't ask much of the user. The models are also steered to avoid having a clear 'point of view', agenda, point-to-make, and on on, characteristics which tend to identify a human writer. They are steered away from highly persuasive behaviour, although there is evidence that they are extremely effective at writing this way (https://www.anthropic.com/news/measuring-model-persuasivenes...). The same arguments apply to spelling and grammar errors, and so on. These are design choices for public facing, commercial products with no particular audience.
An AI detector may be able to identify that a text has some of these properties in cases where they are exceptionally obvious, but fails in the general case. Worse still, students will begin to naturally write like these tools because they are continually exposed to text produced by them!
You can easily get an LLM to produce text in a variety of styles, some which are dissimilar to normal human writing entirely, such as unique ones which are the amalgamation of many different and discordant styles. You can get the models to produce highly coherent text which is indistinguishable from that of any individual person with any particular agenda and tone of voice that you want. You can get the models to produce text with varying cadence, with incredible cleverness of diction and structure, with intermittent errors and backtracking and _anything else you can imagine. It's not super easy to get the commercial products to do this, but trivial to get an open source model to behave this way. So you can guarantee that there are a million open source solutions for students and working professionals that will pop up to produce 'undetectable' AI output. This battle is lost, and there is no closing pandora's box. My earlier point about students slowly adopting the style of the commercial LLMs really frightens me in particular, because it is a shallow, pointless way of writing which demands little to no interaction with the text, tends to be devoid of questions or rhetorical devices, and in my opinion, makes us worse at thinking.
We need to search for new solutions and new approaches for education.
> We need to search for new solutions and new approaches for education.
Thank you for that and for everything you wrote above it. I completely agree, and you put it much better than I could have.
I teach at a university in Japan. We started struggling with such issues in 2017, soon after Google Translate suddenly got better and nonnative writers became able to use it to produce okay writing in English or another second language. Discussions about how to respond continued among educators—with no consensus being reached—until the release of ChatGPT, which kicked the problem into overdrive. As you say, new approaches to education are absolutely necessary, but finding them and getting stakeholders to agree to them is proving to be very, very difficult.
I recently deployed an AI detector for a large K12 platform (multi-state 20k+ students), and they DO work in the sense of saving teachers time.
You have to understand, you are a smart professional individual who will try to avoid being detected, but 6-12th grade students can be incredibly lazy and procrastinate. You may take the time to add a tone, style and cadence to your prompt but many students do not. They can be so bad you find the "As an AI assistant..." line in their submitted work. We have about 11% of assignments are blatantly using AI, and after manual review of over 3,000 submitted assignments GPTZero is quite capable and had very few (<20) false positives.
Do you want teachers wasting time loading, reviewing and ultimately commenting on clear AI slop? No you do not, they have very little time as is and that time will be better spent helping other students.
Of course, you need a process to deal with false positives, the same way we had one for our plagiarism detector. We had to make decisions many years ago about what percentage of false positives is okay, and what the process looks like when it's wrong.
Put simply, the end goal isn't to catch everyone, it's to catch the worst offenders such that your staff don't get worn down, and your students get a better education.
Doesnt google docs have a feature that shows writing history.
You could ask the student to start wrkting on google docs, and whenever someone gets a false positive, they can prove they wrote it through that.
And Besides 99% of people who use AI to write, dont bother claiming it as a false positive, so giving students the right to contest that claim would not be that much if a problem long term.
Yeah, those are great points, and our students do use Google Docs today, and you are right most students do not even contest it.
We let them resubmit a new paper when they are caught, and they get some one on one time with a tutor to help move them forward. Typically they were stuck or rushing, which is why they dumped a whole AI slop assignment into our LMS.
Former high energy theorist here: things are not looking so good for high energy physics (both theoretical and experimental) which loosely speaking accounted for maybe 1/3-1/2 of Nobel Prizes in the 20th century. That’s part of the reason I got out. I’m inclined to say astrophysics and cosmology, another pillar of the fundamental understanding of the universe, isn’t doing that well either, probably in the okayish but not as exciting as it used to be territory. I’m not qualified to talk about other fields.
I think saying they're not looking good might be a bit of an exaggeration. Technological developments in both high energy physics and astrophysics stuff are in-between generations of technology right now, which is why things are a bit slower than usual.
With astrophysics, we're probably going to need the more sensitive gravitational wave detectors that are in development to become operational for new big breakthroughs. With high energy physics, many particle colliders and synchrotron light sources seem to be undergoing major upgrades these days. While particle colliders tend to get the spotlight in the public eye and are in a weird spot regarding the expected research outcomes, light sources are still doing pretty well afaik.
This Nobel I think is mainly because AI has overwhelmingly dominated the public's perception of scientific/technological progress this year.
> With high energy physics, many particle colliders and synchrotron light sources seem to be undergoing major upgrades these days.
AFAIK synchrotron light sources are tools for materials science and other applied fields, not high energy physics. Did I miss something?
I am also puzzled by the "many particle colliders". There is currently only one capable of operating at the high energy frontier. It's getting a luminosity upgrade [1] which will increase the number of events, but those will still be the 14 TeV proton-proton collisions it's been producing for years. There is some hope that collecting more statistics will reveal something currently hidden in the background noise, but I wouldn't bet on it.
>AFAIK synchrotron light sources are tools for materials science and other applied fields, not high energy physics. Did I miss something?
When you put it like that, yeah, I was kinda being stupid. During my stint doing research at a synchrotron light source I was constantly told to focus on thinking like a physicist (rather than as a computer engineer) and most of the work of everyone who wasn't a beamline scientist was primarily physics focused, which is what led me to think that way. But you're right in that it might not make much sense for me to say that makes them high energy physics research tools first.
>I am also puzzled by the "many particle colliders". There is currently only one capable of operating at the high energy frontier. It's getting a luminosity upgrade [1] which will increase the number of events, but those will still be the 14 TeV proton-proton collisions it's been producing for years. There is some hope that collecting more statistics will reveal something currently hidden in the background noise, but I wouldn't bet on it.
The RHIC is also in the process of being upgraded to the EIC. But overall, yes, that's why I said they were in a 'weird' spot. I too am not convinced that the upgrades will offer Nobel-tier breakthroughs.
What are you considering "high energy physics"? "1/3-1/2 of Nobel Prizes in the 20th century" is a significant overestimation unless you are including topics not traditionally included in high energy physics. For example, there were many Nobel prizes in nuclear physics, which shares various parallels with high energy physics in terms of historical origins, experimental techniques, and theoretical foundations. But nuclear physics is in a very exciting era of experimental and theoretical developments, so your "not looking so good" description does not apply.
Much of nuclear physics was effectively “high energy physics” (or more appropriately named elementary particle physics) back in the day. They ceased to be elementary or high energy at some point. My very loose categorization is everything on the microscopic path towards the fundamental theories; and there’s another macroscopic path, cosmology.
Agreed on that. My disagreement is with the statement that everything that was once referred to as high energy physics is "not looking so good". Nuclear physics in particular does not feel stuck in the way I've heard some high energy physicists talk about their field.
As a layman, the visualization of black holes, the superstructure above and below the Milky Way, JWST’s distant galaxy discoveries, gravitational wave detectors as mentioned, and some of the Kuiper Belt observations all seem to be interesting and exciting.
"theoretical physics" is such a big and ambiguous concept that physicists tend not to use the word in discussions. Thereotical work often involves a lot of numerical simulation on super computers these days which are kind of their own "experiments". And it is usually more productive to just mention the specific field, e.g. astronomy, condensed matter, AMO etc, and you can be sure there is always a lot of discoveries in each area.
Physics is not stuck in string theory as physics is not just high energy theoretical particle physics. There's also more going on in high energy theoretical particle physics than just "string theory".
Much of the experimental action in recent decades has been in low energy theoretical particle physics. Down near absolute zero, where quantum effects dominate and many of the stranger predictions of quantum mechanics can be observed directly. The Nobel Prizes in physics for 1996, 1997, 1998, 2001, and 2003 were all based on experimental work down near absolute zero.
Please bro just one more collider. Just one more collider bro. I swear bro we're gonna fix physics forever. Just one more collider bro. We could go up or even underground. Please bro just one more collider.
L-theanine (200mg) with around 100-150mg of caffeine has an extremely noticeable, positive effect on my ability to focus, feeling of "well-situatedness", and overall calmness. L-theanine by itself doesn't seem to do much. Caffeine on its own wakes me but makes me feel jittery and anxious, so it's definitely an interaction effect. Taurine has a much smaller effect on calmness, sans interactions - often indistinguishable from any other mild focus exercise like box breathing or stretching.
Quite good support for the caffeine/theanine interaction in the literature -- I was mainly trying to see if it could improve sleep which is why I didn't take it with caffeine. Would be interesting to do some blinded cognitive test at some point though to get some estimate on how much congitive performance is increased. I need to think about ways to measure that, aside from flashcards. I actually have the website connected to a chess API so that might be a nice test.
Police in large American cities are not likely to be of much assistance in this situation. Assuming they attend at all, I would expect them to not understand the nature of the issue and probably proceed to make it much worse.
reply