I wonder if it has to do with how meaning is tied to the tokens. c+amara+derie (using the official gpt-5 tokenizer).
There's also just that weird thing where they're obsessed with emoji which I've always assumed is because they're the only logograms in english and therefore have a lot of weight per byte.
There's also just that weird thing where they're obsessed with emoji which I've always assumed is because they're the only logograms in english and therefore have a lot of weight per byte.