That's not a bad suggestion. I thought about adding a numerical score but it felt like it was bit overwhelming at the time. Maybe I should revisit it though in the form of:
I agree with this, some of those are "passing" and others are really passing. Specially with how much better some of the new model is compared to old ones.
I think the paws one is a good example where I think the new model got 100% while the other was more like 75%
Perhaps it would be an easy cop out of making a decision if you had to choose something outside of pass/fail.