I have an RTX 4090 and 192GB of RAM - what size model of Deepseek R1 can I run l...

qingcharles · on Jan 21, 2025

AFAIK you want a model that will sit within the 24GB VRAM on the GPU and leave a couple of gigs for context. Once you start hitting system RAM on a PC you're smoked. It'll run, but you'll hate your life.

Have you ever run a local LLM at all? If not, it is still a little annoying to get running well. I would start here:

https://www.reddit.com/r/LocalLLaMA/

NitpickLawyer · on Jan 21, 2025

You can't run the big R1 in any useful quant, but can use the distilled models with your setup. They've released (MIT) versions of qwen (1.5,7,14 and 32b) and llama3 (8 and 70b) distilled on 800k samples from R1. They are pretty impressive, so you can try them out.

diggan · on Jan 21, 2025

Download something like LM Studio (no affiliation) that is a bit easier for non-terminal users to use, compared to Ollama, and start downloading/loading models :)