I played with gemma-3-4b-it-qat recently using a mid-tier graphics card and a few things stood out to me:
1. It was very fast, between 35 and 70 tokens per second, with initial response in under 200ms. That kind of speed is a feature.
2. It was very useful. I had a brainstorming session with it that was both fluid and fruitful
3. I can't wrap my head around so much knowledge being contained in about 3GB of data. It seems to know something about everything. Imperfect, but very useful.
1. It was very fast, between 35 and 70 tokens per second, with initial response in under 200ms. That kind of speed is a feature.
2. It was very useful. I had a brainstorming session with it that was both fluid and fruitful
3. I can't wrap my head around so much knowledge being contained in about 3GB of data. It seems to know something about everything. Imperfect, but very useful.