Hacker Newsnew | past | comments | ask | show | jobs | submit | mark_yellow's commentslogin

One can manipulate LLM rankings to put any model in the lead—only by modifying the single character separating demonstration examples.

- MMLU performance varies by +/- 23% depending on the choice of delimiter across leading open model families (Llama, Qwen, and Gemma). - Closed models, GPT-4o, are also brittle to the choice of delimiter.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: