Yes, but I suspect that the goals of the RL (in order to reason, we need to be a...

		HarHarVeryFunny on Sept 13, 2024 \| parent \| context \| favorite \| on: OpenAI threatens to revoke o1 access for asking it... Yes, but I suspect that the goals of the RL (in order to reason, we need to be able to "break down tricky steps into simpler ones", etc) were hand chosen, then a training set demonstrating these reasoning capabilities/components was constructed to match.