Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AMD's developer docs clearly state, and I am quoting,

Please quote something that's unambiguously supporting your claims. What you've quoted is insufficient.

What I said about a single-socket Rome processor is not "completely incorrect" under any reasonable interpretation. The latency and bandwidth limitations in moving data from one side of the IO die to another is much smaller than the inter-socket connections that were traditionally implied by NUMA, or the inter-chiplet communication in first-gen EPYC/Threadripper.

If you want to insist that NUMA apply to even the slightest measurable memory performance asymmetry between cores, please say so, so that we may know ahead of time whether the discussion is also going to lead to esoteric details like the ring vs mesh interconnects on Intel's processors.



If you're not sensitive to main memory latency, just say that. Don't try to tell me that 25ns is not relevant. It's ~100 CPU cycles and it's also about 25% swing from fastest to slowest.


Intel's server/workstation CPUs have had 2 memory controllers during the last several generations, so even if the whole CPU is seen as a single NUMA node by the software, the actual memory access latency has always been different from core to core, depending on the core position on the intercommunication mesh or ring.

So what ???

The initial posting was about the CPU being seen as a single or multiple NUMA node by the software, not about having an equal memory access latency for all cores, which has never been true for any server/workstation CPU, from any vendor, since many, many years ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: