Towards Optimal LLM Quantization

aa6864aa · on June 2, 2024

How does it compare with AWQ, SqueezeLLM, or newer quantization methods?

abcd98 · on June 2, 2024

How do you integrate with vLLM?

dynamix · on June 2, 2024

Is there a way for me to compress a custom fine-tuned model of my own?

bejager · on June 2, 2024

not yet but it's something we have in mind as a future feature.

eonlav · on June 2, 2024

Decent platform support - any plans for a Rust SDK?

bejager · on June 2, 2024

We continuously work on expanding SDK support, Rust is also on the list.

aviel · on June 2, 2024

Any benchmarks with Falcon 2?

bejager · on June 2, 2024

we don't support Falcon 2 yet but new models are always on our radar to be added to the platform.