More

i000 · 2026-05-07T01:45:37 1778118337

> do have long-term memory from their training and thus act very much like someone suffering from Alzheimer’s.

Your 8th grade science teacher may be disappointed too. Drawing such analogies using unequivocal language "very much like" disregards the limited understanding of LLMs, the false analogies between computer and biological systems, and the complex nature of Alzheimer's disease (no it is not just short term memory loss, not even close, for example ability to interpret images)

handoflixue · 2026-05-07T02:23:04 1778120584

> for example ability to interpret images

I'm pretty sure blind people are conscious despite that.

i000 · 2026-05-07T02:43:45 1778121825

Hmm.The point was that people with Alzheimers have trouble interpreting images, and obviously remain concious until the latest stages of their disease.

legacynl · 2026-05-07T13:22:35 1778160155

> remain concious until the latest stages of their disease.

Are you saying that people with advanced Alzheimers lose consciousness? That's not the case. Although it might become hard for people with advanced Alzheimers to demonstrate their consciousness, that doesn't mean that their consciousness isn't there.

i000 · 2026-04-30T04:50:20 1777524620

Have you looked at the rise of Germany's right-wing politics recently? AfD with 18.8% in Baden-Württemberg (west Germany, relatively rich).

i000 · 2026-04-29T23:38:26 1777505906

Not sure why you are voted down. Came here to make the same joke.

strangegecko · 2026-04-29T23:45:50 1777506350

It's a dumb joke considering Germany has been one of the most peaceful countries in decades. And the people making the jokes are often citizens of a country actively engaged in wars.

i000 · 2026-04-30T04:44:53 1777524293

Gearmany's pacifism is just like its green energy transition hypocrytical and ineffective. Their Energiewende was to shutdown nuclear to bring back coal. Their Zeitenwend amounted to bankrolling Putin's war machine via the Norstream pipelines at the expense of the very same countries they tried to anhiliate in WW2. So yeah, I think I can crack a joke.

lkbm · 2026-04-30T00:56:54 1777510614

Because we're not here for Reddit jokes.

i000 · 2026-04-26T12:54:55 1777208095

It is also wrong...

ramraj07 · 2026-04-26T15:41:22 1777218082

Wouldn't mind knowing how.

i000 · 2026-04-20T18:45:48 1776710748

What makes ggplot great is that it allows manual adjustments AND has a nice declerative grammar. Hard for me to see the value of a plotting library without being able to adjust plots.

hadley · 2026-04-21T02:45:02 1776739502

You’ll be able to adjust plots. But you have to do it with code, not UI.

i000 · 2026-03-29T21:29:32 1774819772

Imagine a pizza place that allows you to pick from 40+ toppings but you can only order a margherita ;) /s

philjohnson · 2026-03-29T21:31:37 1774819897

Bad analogy

Imagine a pizza place where you can try three slices for free before you order one?

Or that you can make a pizza at the shop, add and remove topping as you wish until you're satisfied?

atoav · 2026-03-30T00:23:20 1774830200

Also a bad analogy. A slize of pizza has no onboarding cost for the user. You eat it and that is it. A PDF editor requires you to understand how to use it.

A better comparison would be a pizza shop at the end of a long hike that advertised itself online to offer infinite amount of free pizza. So you go on the hike and then it turns out you only get one slice and have to pay a fortune for the rest. You planned to get free food st the end of the hike, but it turns out the food you eventually will have to eat is not free and not even cheap.

This is not free, it is s free trial.

i000 · 2026-03-21T17:46:24 1774115184

Agreed. What you wrote was probably the input, what we see is the LLM output with the directive to "make us sound smart, put gratuitous em-dash"

i000 · 2026-02-28T02:11:31 1772244691

Would it make sense to embed such single-purpose network with fixed weights within a LLM before pre-training?

ACCount37 · 2026-02-28T06:19:18 1772259558

Good question.

It might work, I considered running a test like this. But it does demand certain things.

The subnetwork has to be either crafted as "gradient resistant" or remain frozen. Not all discovered or handcrafted circuits would survive gradient pressure as is. Especially the kind of gradients that fly in early pre-training.

It has to be able to interface with native representations that would form in a real LLM during pre-training, which is not trivial. This should happen early enough in pre-training. Gradients must start routing through our subnetwork. We can trust "rich get richer" dynamics to take over from there, but for that, we need the full network to discover the subnetwork and start using it.

And finally, it has to start being used for what we want it to be used for. It's possible that an "addition primitive" structure would be subsumed for something else, if you put it into the training run early enough, when LLM's native circuitry is nonexistent.

Overall, for an early test, I'd spray 200 frozen copies of the same subnetwork into an LLM across different layers and watch the dynamics as it goes through pre-training. Roll extra synthetic addition problems into the pre-training data to help discovery along. Less of a principled solution and more of an engineering solution.

rao-v · 2026-02-28T08:36:12 1772267772

+1 I’ve always had the feeling that training from randomly initialized weights without seeding some substructure is unnecessarily slowing LLM training.

Similarly I’m always surprised that we don’t start by training a small set of layers, stack them and then continue.

ACCount37 · 2026-02-28T08:52:27 1772268747

Better-than-random initialization is underexplored, but there are some works in that direction.

One of the main issues is: we don't know how to generate useful computational structure for LLMs - or how to transfer existing structure neatly across architectural variations.

What you describe sounds more like a "progressive growing" approach, which isn't the same, but draws from some similar ideas.

rao-v · 2026-03-06T18:46:18 1772822778

Agree re: progressive growing

In terms of sub structure - in the old days of Core Wars randomly scattering bits of code that did things could pay off. I’m imagining similar things for LLMs - just set 10% of weights as specific known structures and watch to see which are retained / utilized by models and which get treated like random init

benob · 2026-02-28T07:56:58 1772265418

I had that in mind too. What if you handcraft a subnetwork with (some subset of) Turing machine capability? Do those kinds of circuits emerge naturally during training? Can reasoning use them for complex computation?

i000 · 2026-01-28T02:12:22 1769566342

Perhaps the real issue is the gate-keeping scientific publishing model. Journals had a place and role, and peer-review is a critical aspect of the scientific process but new times (internet, citizien science, higher levels of scientific literacy, and now AI) diminish the benefits of journals creating "barriers to entry" as you put it.

desolate_muffin · 2026-01-28T02:21:32 1769566892

I for one hope not to live in a world where academic journals fall out of favor and are replaced by vibe-coded papers by citizen scientists with inflated egos from one too many “you’re absolutely right!” Claude responses.

i000 · 2026-01-28T04:30:50 1769574650

Me neither, but what you present is a false dichotomy. Science used to be a past time of the wealthy elites, it became a profession. By opening up it up progrss was accelerated. Same will happen when publication will be made more open and accessible.

BlueTemplar · 2026-01-28T14:15:56 1769609756

And then, Einstein was a « citizen scientist », wasn't he ?

i000 · 2025-12-03T04:31:26 1764736286

What makes you think that this 'surplus' GDP will be captured by those who do the investments?