Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> These […] get better every day.

They do, but I’ve seen a huge slowdown in “getting better” in the last year. I wonder if it’s my perception, or reality. Each model does better on benchmarks but I’m still experiencing at least a 50% failure rate on _basic_ task completion, and that number hasn’t moved higher in many months.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: