You just have to allow more than 75% memory to be allocated to the GPU by running sudo sysctl -w iogpu.wired_limit_mb=30720 (for a 30 GB limit in this case).
1. That worked after some tweaking.
2. I had to lower the context window size to get LM Studio to load it up.
3. LM Studio has two distinct checkboxes that both say "Apple Metal GPU". No idea if they do the same thing....
Thanks a ton! I'm running on GPU w/ Mixtral 8x Instruct Q4_K_M now. tok/sec is about 4x what CPU only was. (Now at 26 tok/sec or so).
I can have it run in on 'cpu' which is very slow, but offloading to the GPU runs out of memory.