Before committing to purchasing two of these, you should look at the true speeds... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		NitpickLawyer 6 days ago \| parent \| context \| favorite \| on: Kimi K2 1T model runs on 2 512GB M3 Ultras Before committing to purchasing two of these, you should look at the true speeds that few people post. Not just the "it works". We're at a point where we can run these very large models "at home", and it is great! But true usage is now with very large contexts, both in prompt processing, and token generations. Whatever speeds these models get at "0" context is very different than what they get at "useful" context, especially in coding and such.

cubefox 6 days ago | [–]

DeepSeek-v3.2 should be be better for long context because it is using (near linear) sparse attention.

solarkraft 6 days ago | [–]

Are there benchmarks that effectively measure this? This is essential information when speccing out an inference system/model size/quantization type.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact