If you include all the information the LLM uses to produce the next token as part of the state, then of course the LLM is a Markov chain.
So would be any other process for sampling continuations of a text, with finite memory.
If you include all the information the LLM uses to produce the next token as part of the state, then of course the LLM is a Markov chain.
So would be any other process for sampling continuations of a text, with finite memory.