Any gemma-2-9b or 27b 4 bit GGUF's on HuggingFace yet? Thanks!

luke-stanley · on June 27, 2024

Actually for the 9B model, this has 4-bit quantised weights (and others): https://huggingface.co/bartowski/gemma-2-9b-it-GGUF

Still no 27B 4-bit GGUF quants on HF yet!

I'm monitoring this search: https://huggingface.co/models?library=gguf&sort=trending&sea...

SubiculumCode · on June 28, 2024

https://huggingface.co/bartowski/gemma-2-27b-it-GGUF

thot_experiment · on June 28, 2024

I'm curious about the quantization quality claims in the table there. Is this a Gemma 2 specific thing (more subtlety in the weights somehow?). In my testing and testing I've seen elsewhere at least for llama3 8B (and some less rigorous testing with other models) q_8 -> q4_K_M are basically indistinguishable from one another?

janwas · on June 28, 2024

Yes, PPL and certain benchmarks do not detect differences from quantization. But recent work gives cause for concern, e.g., https://arxiv.org/pdf/2310.01382, https://arxiv.org/pdf/2405.18137.

luke-stanley · on June 29, 2024

The first paper is good to critique the performance of quantised models, it points out that 40-50% 'compression' typically results in only slight loss for RAG tasks relying on in-context learning, but for factual tasks replying on stored knowledge, performance very quickly dropped off. They looked at Vicuna, one of the earlier models, so I wonder how applicable it is to recent models like the Phi 3 range. I don't think deliberate clever adversarial attacks like those of the 2nd paper are a sensible worry for most, but it is fun. Thanks for the links @janwas.

XzAeRosho · on June 27, 2024

It's on HuggingFace already: https://huggingface.co/google/gemma-2-9b

luke-stanley · on June 27, 2024

I know the safe tensors are there, but I said GGUF 4-bit quantised, which is kinda the standard for useful local applications, a typical balanced sweet spot of performance and quality. It's makes it much easier to use, works in more places, be it personal devices or a server etc.

chown · on June 27, 2024

If you are still looking for it, I just made it available on an app[1] that I am working on with Gemma2 support.

https://msty.app

luke-stanley · on June 27, 2024

Are you saying you put a 4-bit GGUF on HuggingFace?