How big this cached data is? Wouldn't it be possible to download it after idling a few minutes "to suspend the session", and upload and restore it when the user starts their next interaction?
I often see a local model QWEN3.5-Coder-Next grow to about 5 GB or so over the course of a session using llamacpp-server. I'd better these trillion parameter models are even worse. Even if you wanted to download it or offload it or offered that as a service, to start back up again, you'd _still_ be paying the token cost because all of that context _is_ the tokens you've just done.
The cache is what makes your journey from 1k prompt to 1million token solution speedy in one 'vibe' session. Loading that again will cost the entire journey.
What they mean when they say 'cached' is that it is loaded into the GPU memory on anthropic servers.
You already have the data on your own machine, and that 'upload and restore' process is exactly what is happening when you restart an idle session. The issue is that it takes time, and it counts as token usage because you have to send the data for the GPU to load, and that data is the 'tokens'.
Wrong on both counts. The kv-cache is likely to be offloaded to RAM or disk. What you have locally is just the log of messages. The kv-cache is the internal LLM state after having processed these messages, and it is a lot bigger.
I shouldn't have said 'loaded into GPU memory', but my point still stands... the cached data is on the anthropic side, which means that caching more locally isn't going to help with that.
> upload and restore it when the user starts their next interaction
The data is the conversation (along with the thinking tokens).
There is no download - you already have it.
The issue is that it gets expunged from the (very expensive, very limited) GPU cache and to reload the cache you have to reprocess the whole conversation.
That is doable, but as Boris notes it costs lots of tokens.
> Thus succeeding at making the telecommunications vendors used for Top Secret US national security data less secure, the obvious goal of the US National Security Agency
NSA still has the secret Suite A system for their most sensitive information. If they think that is better than the current public algorithms and their goal is to make telecommunications vendors to have better encryption, then why doesn't they publish those so telco could use it?
> Truly, truly can't understand why anyone finds this line of reasoning plausible. (Before anyone yells Dual_EC_DRBG, that was a NOBUS backdoor, which is an argument against the NSA promoting mathematically broken cryptography, if anything.)
The NSA weakened DES against brute-force attack by reducing the key size (while making it stronger against differential cryptanalysis, though).
The thing that sets this effort apart from DES and Clipper is that USG actually has skin in the game. Neither DES or Clipper were ever intended or approved to protect classified information.
These are algorithms that NSA will use in real systems to protect information up to the TOP SECRET codeword level through programs such as CNSA 2.0[1] and CsFC.
Couldn't NSA have not known about an issue with ML-KEM, and thus wanted to prevent its commercial acceptance, which it did simply by approving the algorithm?
What's the PQC construction you couldn't say either thing about?
> Couldn't NSA have not known about an issue with ML-KEM, and thus wanted to prevent its commercial acceptance, which it did simply by approving the algorithm?
Could, but they did not do that. So, the question is to be stated: Why?
I think you could use dm-integrity over the raw disks to have checksums and protect against bitrot then you can use mdraid to make a RAID1/5/6 of the virtual blockdevs presented by dm-integrity.
I suspect this is still vulnerable to the write hole problem.
You can add LVM to get snapshots, but this still not an end-to-end copy-on-write solution that btrfs and ZFS should provide.
Why does a routing protocol matter for the banking sector? With proper encryption the route the packets of transaction data takes should not matter at all.
> QLC is basically useless unless you use it as write-once-read-many memory
The market thoroughly disagrees with your stupid exaggeration. QLC is a high-volume mainstream product. It's popular in low-end consumer SSDs, where the main problem is not endurance but sustained performance (especially writing to a mostly-full drive). A Windows PC is hardly a WORM workload.
Seems like it is though? Most consumer usage does not have much churn. For things like the browser cache that do churn the total volume isn't that high.
The comparison here is database and caching workloads in the datacenter that experience high churn at an extremely high sustained volume. Many such workloads exist.
Consumer usage does not have much churn, but the average desktop is probably doing 5-50 drive writes per year. That's far away from a heavy database load, but it's just as far away from WORM.
There's a very big difference between a workload where you have to take care to structure your IO to minimize writes so you don't burn out the drive, and a workload that is simply easy enough that you don't have to care about the write endurance because even the crappy drives will last for years.
Of course. The inferior but cheaper technology is more cost effective in most cases but for certain workloads that won't be the case despite being more affordable per unit upfront.
The workloads flash is more cost effective for (ie most of them) either aren't all that write heavy or alternatively leave the drive sitting idle the vast majority of the time. The typical consumer usecase is primarily reads while it mostly sits idle, with the relevant performance metrics largely determined by occasional bursts of activity.
reply