As the other commenter said, R1 required very standard RLHF techniques too. But ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		natolambert on Feb 2, 2025 \| parent \| context \| favorite \| on: RLHF Book As the other commenter said, R1 required very standard RLHF techniques too. But a fun way to think about it is that reasoning models are going to be bigger and uplift the RLHF boat. But we need a few years to establish basics before I can write a cumulative RL for LLMs book ;)

JackYoustra on Feb 7, 2025 [–]

This is a GREAT book, if you decide to write it in a rolling fashion you'd have at least one reader from the start :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact