Low scores on HLE and ARC AGI might be a good sign. They didn't goodhart their m...

Low scores on HLE and ARC AGI might be a good sign. They didn't goodhart their models. ARG AGI in particular doesn't mean much, IMO. It's just some weird hard geometry induction. I don't think it correlates well with real world problem solving.

AFAICT, claude code is the biggest engineering mind share. An apple software engineer of mine says he sometimes uses $100/day of claude code tokens at work and gets sad, because that's the budget.

Also, look at costs and revenue. OpenAI is bleeding way more than Antropic.