Hacker Newsnew | past | comments | ask | show | jobs | submit | SkyPuncher's commentslogin

My suspicion is the have an overall fixed cache size that dumps the oldest records. They’re now overflowing with usage and consistently dumping fresh caches.

During core US business hours, I have to actively keep a session going or I risk a massive jump in usage while the entire thread rebuilds. During weekend or off-hours, I never see the crazy jumps in usage - even if I let threads sit stale.


This is my exact experience as well.

It’s further frustrating that I have committed to certain project deadlines knowing that I’d be able to complete it in X amount of time with agent tooling. That agentic tooling is no longer viable and I’m scrambling to readjust expectations and how much I can commit to.


And it’s working larger because the other models haven’t figured out how to provide a consistent, long running experience.

I’ve never been actually rate limited. Usage limits display in yellow when you’re above 90%. At the limit, you’ll get a red error message.

This is the problem most people are facing. Before March, I had hit the rate limit as single time. That involved security audit of our entire code base from a few different angles.

As of now, I’m consistently hitting my 5 hour limit in less than 1 hour during N/A business hours. I’m getting to the point where I basically can’t use CC for work unless I work very early or late in the day.


I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.

Here’s what I’ve done to mostly fix my usage issues:

* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.

* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.

* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.


Haha. yeah my eyes glazed over immediately on the issue. Absolutely this was someone telling their Claude Code to investigate why they ran out of tokens and open the issue.

Good chance it's not real or misdiagnosed. But it gives me some degree of schadenfreude to see it happening to the Claude Code repo.


I love the irony of it all. You reap what you sow.

But they (CC team) confirmed this is the case and intended behaviour, and closed the issue as not planned.

And you think companies aren't doing the same back to us? Are you sure you're speaking to a human?

Its your claude speaking to their claude, which is fair, but it makes this whole discussion a bit dumb since we are basically talking about two bots arguing with each other.

This was part of Sam Altman's (supposed) concerns about AI not being open and equally available. It a dystopian future it might be their cluster of 1000 agents using a GWhr of power to argue against your open weights agent who has to run on a M5.

I love how some comments tell you to turn max thinking on and others tell you to turn thinking off entirely. Apparently, they both save tokens!

Vibes, indeed.


Could be some logic to it - bad thinking is worse than no thinking and good thinking

The problem is actually because their cache invalidates randomly so that's why replaying inputs at 200k+ and above sucks up all usage. This is a bug within their systems that they refuse to acknowledge. My guess is that API clients kick off subscription users cache early which explains this behavior, if so then it's a feature not a bug.

They also silently raised the usage input tokens consume so it's a double whammi.


Can’t you turn the 1M off with a /model opus (or /model sonnet)?

At least up until recently the 1M model was separated into /model opus[1M]


1M context window is still a separate, non-default model in Claude Code and not included with subscriptions (billed at API rates only)

Opus[1m] has heen the default model for max subscriptions since 2.1.75.

https://github.com/anthropics/claude-code/commit/48b1c6c0ba0...


It depends on your account and seems to be random.

On my personal Max 5x account it’s not default and if I force it, it says I’ll pay API rates past 200k. On my other account that I use for work (not an enterprise account just another regular Max 5x account) the 1M model has been the default since that rollout. I’ve tried updating and reinstalling etc, and I can’t ever get the 1M default model on my personal account.

Based on other comments and discussion online as well as Claude code repo issues, it seems I’m not the only one not getting the 1M model for whatever reason and the issue continues to be unresolved.


what? Opus 1m has been in place for at least a few weeks for plan users.

Can confirm. Max effort helps; limiting context <= ~20-25% is crucial anymore.

> * keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.

Is this as opaque on their end as it sounds, or is there a way to check?


> * Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.

This is definitely true. Ever since I realized there is an /effort max option I am no longer fighting it that much and wasting hours.


> This is the most frustrating thing because Anthropic forced the 1M model on everyone.

This is spot on. It would be great (and very easy for them) to have a setting where you can force compaction at a much lower value, eg 300k tokens.


CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000

Everything starts to feel like AI slop these days. Including this comment.

Further, Opus identified most of the vulnerabilities itself already. It just couldn’t exploit them.

Mythos seems much, much more creative and self directed, but I’m not yet convinced the core capabilities are significantly higher than what’s possible today.

The full price of finding the vulnerabilities was also something like $20k. That’s a price point that brings a skilled professional in to accomplish the same task.


Remember, that's the most expensive this capability will ever be.

If it's model is opened up and can run on commodity hardware. Otherwise price could go up as RAM and silicon prices climb.

Yes, but the problem with these models isn't a gradual shift, it's a step function. With a gradual shift, the world has time to react and adapt.

That part of things is what really made this entire argument all apart of me.

There are ~50k psychiatrists in the US. Roughly, 1 in 10k people in the US is named Scott. Mathematically, that means knowing "Scott is a psychiatrist" brings you down to ~5 people. Even if we assume there's some outlier clustering of people named Scott who are psychiatrists, we're still talking about some small number.

Surely adding in the middle name essentially makes him uniquely identifiable without an other corroborating information.


> Roughly, 1 in 10k people in the US is named Scott.

Seems to be more like one in 425 per SSA.


Take a moment and apply some common sense to your math. Do you really think there are 5 psychiatrists in the country named Scott? That's off by multiple orders of magnitude.

No, but I doubt there are more than 100.

The magnitude is so small that anonymity is essential broken.


Keep in mind that Opus detected most of these vulnerabilities, it just didn’t exploit them (says so much in the article).

I’m honestly not convinced this is changing the landscape significantly. It’s simple a bit better at self directing.


Yes, I agree. I’m about to drop Claude Code because it’s become literally unusable.

Today, Opus went in circles trying to get a toggle button to work.


Same. Asked CC Opus about a change in a particular file...it looked in a totally different file and told me there was no change.

I've switched to max thinking mode as my default and that's helping in some capacity.

It's not necessarily back to where it was, but it's not desk-flipping bad.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: