Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are comparing the speed to execute training to 125K steps, not speed to a given accuracy.

In section 4.8 they compare accuracy at the same amount of training time for the biggest of each model and show that ALBERT is substantially better.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: