Hacker Newsnew | past | comments | ask | show | jobs | submit | raron's commentslogin

You could just jail the CEO or who was responsible for the security at that agency / company.

How big this cached data is? Wouldn't it be possible to download it after idling a few minutes "to suspend the session", and upload and restore it when the user starts their next interaction?

Should be about 10~20 GiB per session. Save/restore is exactly what DeepSeek does using its 3FS distributed filesystem: https://github.com/deepseek-ai/3fs#3-kvcache

With this much cheaper setup backed by disks, they can offer much better caching experience:

> Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.


I often see a local model QWEN3.5-Coder-Next grow to about 5 GB or so over the course of a session using llamacpp-server. I'd better these trillion parameter models are even worse. Even if you wanted to download it or offload it or offered that as a service, to start back up again, you'd _still_ be paying the token cost because all of that context _is_ the tokens you've just done.

The cache is what makes your journey from 1k prompt to 1million token solution speedy in one 'vibe' session. Loading that again will cost the entire journey.


What they mean when they say 'cached' is that it is loaded into the GPU memory on anthropic servers.

You already have the data on your own machine, and that 'upload and restore' process is exactly what is happening when you restart an idle session. The issue is that it takes time, and it counts as token usage because you have to send the data for the GPU to load, and that data is the 'tokens'.


Wrong on both counts. The kv-cache is likely to be offloaded to RAM or disk. What you have locally is just the log of messages. The kv-cache is the internal LLM state after having processed these messages, and it is a lot bigger.

I shouldn't have said 'loaded into GPU memory', but my point still stands... the cached data is on the anthropic side, which means that caching more locally isn't going to help with that.

> upload and restore it when the user starts their next interaction

The data is the conversation (along with the thinking tokens).

There is no download - you already have it.

The issue is that it gets expunged from the (very expensive, very limited) GPU cache and to reload the cache you have to reprocess the whole conversation.

That is doable, but as Boris notes it costs lots of tokens.


You're quite confidently wrong! :-)

The kv-cache is the internal LLM state after having processed the tokens. It's big, and you do not have it locally.


> The kv-cache is the internal LLM state after having processed the tokens. It's big, and you do not have it locally.

Yes - generated from the data of the conversation.

Read what I said again. I'm explaining how they regenerate the cache by running the conversation though the LLM to reconstruct the KV cache state.


AFAIK they did a lot of illegal things in the Snowden-era, too.


> Thus succeeding at making the telecommunications vendors used for Top Secret US national security data less secure, the obvious goal of the US National Security Agency

NSA still has the secret Suite A system for their most sensitive information. If they think that is better than the current public algorithms and their goal is to make telecommunications vendors to have better encryption, then why doesn't they publish those so telco could use it?

> Truly, truly can't understand why anyone finds this line of reasoning plausible. (Before anyone yells Dual_EC_DRBG, that was a NOBUS backdoor, which is an argument against the NSA promoting mathematically broken cryptography, if anything.)

The NSA weakened DES against brute-force attack by reducing the key size (while making it stronger against differential cryptanalysis, though).

https://en.wikipedia.org/wiki/Data_Encryption_Standard#NSA's...

Also NSA put a broken cipher in the Clipper Chip (beside all the other vulnerabilities).


The thing that sets this effort apart from DES and Clipper is that USG actually has skin in the game. Neither DES or Clipper were ever intended or approved to protect classified information.

These are algorithms that NSA will use in real systems to protect information up to the TOP SECRET codeword level through programs such as CNSA 2.0[1] and CsFC.

[1] https://media.defense.gov/2025/May/30/2003728741/-1/-1/0/CSA...

[2] https://www.nsa.gov/Resources/Commercial-Solutions-for-Class...


> Since then, public cryptographic research has been ahead or even with state work.

How can we know that?

> Who knows what is happening inside the NSA or military facilities?

Couldn't have NSA found an issue with ML-KEM and try to convince people to use it exclusively (not in hybrid scheme with ECC)?


Couldn't NSA have not known about an issue with ML-KEM, and thus wanted to prevent its commercial acceptance, which it did simply by approving the algorithm?

What's the PQC construction you couldn't say either thing about?


> Couldn't NSA have not known about an issue with ML-KEM, and thus wanted to prevent its commercial acceptance, which it did simply by approving the algorithm?

Could, but they did not do that. So, the question is to be stated: Why?


I think you may have missed my point.


Follow nsa suite-b and what the USA forces on different levels of classification.


Kyber/ML-KEM-only is exactly the suite b (CNSA 2) recommendation.


I think you could use dm-integrity over the raw disks to have checksums and protect against bitrot then you can use mdraid to make a RAID1/5/6 of the virtual blockdevs presented by dm-integrity.

I suspect this is still vulnerable to the write hole problem.

You can add LVM to get snapshots, but this still not an end-to-end copy-on-write solution that btrfs and ZFS should provide.


Why does a routing protocol matter for the banking sector? With proper encryption the route the packets of transaction data takes should not matter at all.


Based on EU's public consultation it is not even true (but the number of responses is very small)

https://ec.europa.eu/info/law/better-regulation/have-your-sa...


The next SteamOS release will use Wayland by default for desktop mode, too:

https://steamcommunity.com/games/1675200/announcements/detai...


> I'm not seeing a lot of regrets from folks who moved to TLC and QLC NAND, and those products are more popular than ever.

That's interesting. Even TLC has huge limitations, but QLC is basically useless unless you use it as write-once-read-many memory.

I wish I have bought a lot of SSDs when you could still buy MLC ones.


> QLC is basically useless unless you use it as write-once-read-many memory

The market thoroughly disagrees with your stupid exaggeration. QLC is a high-volume mainstream product. It's popular in low-end consumer SSDs, where the main problem is not endurance but sustained performance (especially writing to a mostly-full drive). A Windows PC is hardly a WORM workload.


Seems like it is though? Most consumer usage does not have much churn. For things like the browser cache that do churn the total volume isn't that high.

The comparison here is database and caching workloads in the datacenter that experience high churn at an extremely high sustained volume. Many such workloads exist.


Consumer usage does not have much churn, but the average desktop is probably doing 5-50 drive writes per year. That's far away from a heavy database load, but it's just as far away from WORM.


There's a very big difference between a workload where you have to take care to structure your IO to minimize writes so you don't burn out the drive, and a workload that is simply easy enough that you don't have to care about the write endurance because even the crappy drives will last for years.


Of course. The inferior but cheaper technology is more cost effective in most cases but for certain workloads that won't be the case despite being more affordable per unit upfront.

The workloads flash is more cost effective for (ie most of them) either aren't all that write heavy or alternatively leave the drive sitting idle the vast majority of the time. The typical consumer usecase is primarily reads while it mostly sits idle, with the relevant performance metrics largely determined by occasional bursts of activity.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: