Thanks, been using Gemma 2 a lot at home as it still holds up very well and the ...

alekandreev · 2025-03-12T08:42:07 1741768927

Picking model sizes is not an exact science. We look for sizes that will fit quantized on different categories on devices (e.g., low-end and high-end smartphone, laptops and 16GB GPUs, and bigger GPUs/TPUs). We also want the ratio of model width to depth (number of layers) to be consistently around 90, which we found works best.

The models are trained with distillation from a bigger teacher. We train them independently, but for v3 we have unified the recipes for 4B-27B, to give you more predictably when scaling up and down to different model sizes.

magicalhippo · 2025-03-12T09:08:04 1741770484

Thanks again, very interesting.

One unexpected (to me) use-case appeared not long ago when I found myself without internet but wanting to fix some non-standard Linux configuration issue. As a Windows guy I tend to web search such things, but local LLM to the rescue!

Even smaller models like Gemma 2 9B has enough compressed knowledge that it managed to help me quickly solve my issue.

This got me thinking how such smaller, but very capable models might be a game-changer in communities where internet might not be available or too expensive for continuous use. It's almost like having a portion of the internet in a box, just add electricity.

alekandreev · 2025-03-12T09:52:53 1741773173

Thank you for the feedback! This is why we are so excited to push more and more on small models for both low end and high end smartphones!

bguberfain · 2025-03-12T16:44:57 1741797897

Can you provide more information about this “bigger teacher” model?