Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As the other commenter said, R1 required very standard RLHF techniques too. But a fun way to think about it is that reasoning models are going to be bigger and uplift the RLHF boat.

But we need a few years to establish basics before I can write a cumulative RL for LLMs book ;)



This is a GREAT book, if you decide to write it in a rolling fashion you'd have at least one reader from the start :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: