I wonder how well suited some of the smaller LLMs like Qwen 0.6B would be suited...

accrual · 2025-07-15T17:51:59 1752601919

I wondered similar. Perhaps a local model cached in a 16GB or 24GB graphics card would perform well too. It would have to be a quantized/distilled model, but maybe sufficient, especially with some additional training as you mentioned.

jszymborski · 2025-07-15T17:56:27 1752602187

If Qwen 0.6B is suitable, then it could fit in 576MB of VRAM[0].

https://huggingface.co/unsloth/Qwen3-0.6B-unsloth-bnb-4bit

numpad0 · 2025-07-16T16:08:46 1752682126

or on a single Axera AX630C module: https://www.youtube.com/watch?v=cMF6OfktIGg&t=25s

otabdeveloper4 · 2025-07-15T20:03:40 1752609820

16Gb is way overkill for this.