In section 4.8 they compare accuracy at the same amount of training time for the biggest of each model and show that ALBERT is substantially better.
In section 4.8 they compare accuracy at the same amount of training time for the biggest of each model and show that ALBERT is substantially better.