Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool. Do you do anything to mitigate ordering bias in the evaluation function, or do you just expect it to average out over time?


No, we don't do anything. Theoretically we could judge several times with different ordering.

We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: