Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but I suspect that the goals of the RL (in order to reason, we need to be able to "break down tricky steps into simpler ones", etc) were hand chosen, then a training set demonstrating these reasoning capabilities/components was constructed to match.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: