Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Before committing to purchasing two of these, you should look at the true speeds that few people post. Not just the "it works". We're at a point where we can run these very large models "at home", and it is great! But true usage is now with very large contexts, both in prompt processing, and token generations. Whatever speeds these models get at "0" context is very different than what they get at "useful" context, especially in coding and such.




DeepSeek-v3.2 should be be better for long context because it is using (near linear) sparse attention.

Are there benchmarks that effectively measure this? This is essential information when speccing out an inference system/model size/quantization type.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: