"Informally, from the point of view of algorithmic information theory, the information content of a string is equivalent to the length of the most-compressed possible self-contained representation of that string. A self-contained representation is essentially a program—in some fixed but otherwise irrelevant universal programming language—that, when run, outputs the original string."
Where it gets tricky is the "self-contained" bit. It's only true with the model weights as a code book, e.g. to allow the LLM to "know about" Slack.
"Informally, from the point of view of algorithmic information theory, the information content of a string is equivalent to the length of the most-compressed possible self-contained representation of that string. A self-contained representation is essentially a program—in some fixed but otherwise irrelevant universal programming language—that, when run, outputs the original string."
Where it gets tricky is the "self-contained" bit. It's only true with the model weights as a code book, e.g. to allow the LLM to "know about" Slack.