> doesn't that mean we're approaching optimality? No. Transformers are Markov ch...

pafoster · 2026-06-05T21:28:20 1780694900

Markov chains are themselves a kind of state machine, namely a probabilistic deterministic finite automaton (PDFA), albeit where state is solely governed by the N most recent symbols. (Deterministic means that given a sequence, we can always infer the associated state transitions unambiguously). I believe the example in the reference you provide represents the more general case of PDFA, which is not representable as a finite order Markov processs.

hnsr · 2026-06-06T01:37:20 1780709840

I think that might have been the example given here: https://bactra.org/notebooks/nn-attention-and-transformers.h...

drdeca · 2026-06-05T23:01:41 1780700501

Huh? Any process on a computer by itself is also a Markov chain.

If you include all the information the LLM uses to produce the next token as part of the state, then of course the LLM is a Markov chain.

So would be any other process for sampling continuations of a text, with finite memory.