Very cool. Do you do anything to mitigate ordering bias in the evaluation functi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		maxrmk 9 months ago \| parent \| context \| favorite \| on: Show HN: RULER – Easily apply RL to any agent Very cool. Do you do anything to mitigate ordering bias in the evaluation function, or do you just expect it to average out over time?

kcorbitt 9 months ago [–]

No, we don't do anything. Theoretically we could judge several times with different ordering.

We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact