More

markhahn · 2026-01-15T16:18:22 1768493902

oh, darn. my least favorite walled garden / vertical monopoly / rentseeker will have to raise prices. I'm sure they can spin this as a quality improvement.

markhahn · 2026-01-12T22:51:08 1768258268

Most of this was "enthusiasts playing with bigboy stuff", but it turns out ok in the end.

markhahn · 2026-01-07T05:51:20 1767765080

no, you really can't.

NVidia's use of "cores" is simply wrong. unless you think a core is a simple scalar ALU. but cores haven't been like that for decades.

or would you like to count cores in a current AMD or Intel CPU? each "core" has half a dozen ALUs/FP pipes, and don't forget to multiply by SIMD width.

markhahn · 2026-01-07T05:39:06 1767764346

MPI is fine, but have you heard of threads?

bee_rider · 2026-01-07T06:32:44 1767767564

Sure, the conventional way of doing things is OpenMP on a node and MPI across nodes, but

* It just seems like a lot of threads to wrangle without some hierarchy. Nested OpenMP is also possible…

* I’m wondering if explicit communication is better from one die to another in this sort of system.

fc417fc802 · 2026-01-07T10:47:57 1767782877

With 2 IO dies aren't there effectively 2 meta NUMA nodes with 4 leaf nodes each? Or am I off base there?

The above doesn't even consider the possibility of multi-CPU systems. I suspect the existing programming models are quickly going to become insufficient for modeling these systems.

I also find myself wondering how atomic instruction performance will fare on these. GPU ISA and memory model on CPU when?

DiabloD3 · 2026-01-07T11:58:35 1767787115

If you query the NUMA layout tree, you have two sibling hw threads per core, then a cluster of 8 or 12 actual cores per die (up to 4 or 8 dies per socket), then the individual sockets (up to 2 sockets per machine).

Before 8 cores per die (introduced in Zen 3, and retained in 4, 5 and 6), the Zen 1/+ and 2 series this would have been two sets of four cores instead of one set of eight (and a split L3 instead of a unified one). I can't remember if the split-CCX had its own NUMA layer in the tree or not, or if they were just iterated in pairs.

fc417fc802 · 2026-01-07T12:20:14 1767788414

What I find myself wondering about is the performance impacts of cross-thread communication in this scenario. With the nested domains it seems like there should be different (and increasingly severe) performance implications for crossing each distinct boundary. Whereas the languages we write in and the programming models we employ don't seem particularly well suited to expressing how we want our code to adapt to such constraints at present, at least not in a generalized manner.

I realize that HPC code can be customized to the specific device it will be run on but more widely deployed software is going to want to abstract these increasingly complex relationships.

DiabloD3 · 2026-01-07T12:47:54 1767790074

Its why, if you want high performance code in this sort of work, you'll want either C or C-like code. For example, learn how madvise() is used. Also, learn how thread local storage works in the context of implementing it on a hierarchical SMP system. Also, learn how to make a message passing system and what "atomic" means (locks are often not your friend here).

Ironically, a lot of people keep shooting themselves in the foot and blindly using MPI or OpenMP or any of the other popular industry supported frameworks, and then thinking that magically bails them out. It doesn't.

The most important thing you need, above all others: make sure the problem you're solving can be parallelized, and CPUs are the right way of doing it. Once you've answered this question, and the answer is yes, you can just write it pretty much normally.

Also, ironically, you can write Java that isn't shit and takes advantage of systems like these. Sun and the post-Sun community put a lot of work into the Hotspot JVM to make it scale alarmingly well on high core count machines. Java used correctly is performant.

bee_rider · 2026-01-07T12:51:59 1767790319

Chips and cheese did some measurements for previous AMD generations, they have a pretty core to core latency measurement a little bit after halfway down the page.

https://chipsandcheese.com/p/genoa-x-server-v-cache-round-2

fweimer · 2026-01-07T18:18:35 1767809915

There should be plenty of existing programming models that can be reused because HPC used single-image multi-hop NUMA systems a lot before the Beowulf clusters took over.

Even today, I think very large enterprise systems (where a single kernel runs on a single system that spans multiple racks) are built like this, too.

markhahn · 2026-01-06T23:14:36 1767741276

these big high-core systems do scale, really well, on the workloads they're intended for. not games, desktops, web/db servers, lightweight stuff like that. but scientific, engineering - simulations and the like, they fly! enough that the HPC world still tends to use dual-socket servers. maybe less so for AI, where at least in the past, you'd only need a few cores per hefty GPU - possibly K/V stuff is giving CPUs more to do...

p12tic · 2026-01-06T23:53:27 1767743607

> not ... web/db servers, lightweight stuff like that.

They scale very well for web and db servers as well. You just put lots of containers/VMs on a single server.

AMD EPYC has a separate architecture specifically for such workloads. It's a bit weaker, runs at lower frequency and power and takes less silicon area. This way AMD can put more such cores on a single CPU (192 vs 128 for Zen 5c vs 5). So it's the other way round - web servers love high core count CPUs.

markhahn · 2026-01-07T06:10:00 1767766200

not really - you can certainly put lots of lightweight services on it, but they don't scale. because each core doesn't really get that much cache or memory bandwidth. it's not bad, just not better.

tucnak · 2026-01-07T08:50:36 1767775836

Not true. You should look up Sienna chips and something like ASUS S14NA-U12. It has six DDR5-4800 channels, two physical PCIe 5.0 ports, two M.2 ports, and six MCIO x8 ports. All lanes are full-bandwidth. The 8434PN CPU gets you 48 physical cores in a 150W envelope. Zen 4c really is magic, and LOTS of bandwidth to play with.

rbanffy · 2026-01-06T23:30:33 1767742233

> not games, desktops, web/db servers, lightweight stuff like that.

Things like games, desktops, browsers, and such were designed for computers with a handful of cores, but the core count will only go up on these devices - a very pedestrian desktop these days has more than 8 cores.

If you want to make software that’ll run well enough 10 years from now, you’d better start using computers from 10 years from now. A 256 core chip might be just that.

markhahn · 2026-01-07T06:16:27 1767766587

why do you think lightweight uses will ever scale to lots of cores?

the standard consumer computer of today has only a few cores that race-to-sleep, because there simply isn't that much to do. where do you imagine the parallel work will come from? even for games, will work shift off the GPU onto the host processor? seems unlikely.

future-proofing isn't about inflating your use of threads, but being smart about memory and IO. those have been the bottleneck for decades now.

rbanffy · 2026-01-07T15:13:17 1767798797

> why do you think lightweight uses will ever scale to lots of cores?

Because the cores will be there, regardless. At some point, machines will be able to do a lot of background activity, learn about what we are doing, so that local agentic models can act as better intelligent assistants. I don't know what will be the killer app for the kilocore desktop - nobody knows that, but when PARC made a workstation with bit-mapped graphics out of a semi custom built minicomputer that could easily serve a department of text terminals we got things like GUIs, WYSYWIG, Smalltalk, and a lot of other fancy things nobody imagined back then.

You can try to invent the future using current tech, or you might just try to see what's possible with tomorrow's tools and observe it first hand.

markhahn · 2026-01-03T23:16:59 1767482219

those who remember the past are doomed to repeat it?

seriously, this doesn't seem like a useful argument, regardless of whether true. the fact that humans have committed ecocide in the past doesn't seem like a reason to continue...

JumpCrisscross · 2026-01-03T23:36:58 1767483418

> this doesn't seem like a useful argument

It's not. It's a comforting lie to justify inaction. You see it a lot when people justify not voting or civically engaging.

To be clear, I am doing jack shit about deep-sea mining. But that's a choice I'm making and I own it, even if it makes me uncomfortable. (And there are plenty of cases where that discomfort drives folks into action, however minor.)

fc417fc802 · 2026-01-04T02:08:09 1767492489

I didn't read it as a justification for inaction but rather a reality check. The tone of the parent seemed to imply that the current situation is somehow unusual or unexpected.

The difference between "he's gone mad" which seems to imply that an urgent response is warranted versus "unsurprisingly, his long standing madness continues".

measurablefunc · 2026-01-03T23:36:55 1767483415

It's genetic, you can't change the character of an exploitative predator by just wishing for it.

markhahn · 2026-01-03T23:14:40 1767482080

Dredging is immoral - so incredibly destructive to the ecosystem. Do you harvest apples by bulldozing the orchard and sifting out the fruit? It's bad enough that so much of our farming is still based on tilling.

I don't really understand why nodule gathering isn't already done - just with some kind of robotic fingers or aimed suction devices. It's not as if nodules are hard to discriminate. Sure, there would be some interesting engineering challenges to operating equipment at scale in that environment, but it's undergrad-engineering-club-level, not rocket science...

JumpCrisscross · 2026-01-03T23:16:26 1767482186

> Do you harvest apples by bulldozing the orchard and sifting out the fruit?

Isn't this how we harvest cranberries?

> don't really understand why nodule gathering isn't already done - just with some kind of robotic fingers or aimed suction devices

The nodules may facilitate some weird deep-sea electrolysis that lets these ecosystems respire. Removing the nodules delicately is better than dredging. But it may still be a death sentence.

potato3732842 · 2026-01-04T01:48:50 1767491330

>Isn't this how we harvest cranberries?

Nope. They have harvesting equipment that leaves the plants.

Peanuts maybe?

markhahn · 2025-12-20T05:14:32 1766207672

by not conflating "internet" with "social network / doomscrolling".

no, seriously, that's the main thing. nothing about "internet" says you have to let ad-motivated recommendation engines fill your gullet.

let's call it "mindful internet": you deliberately choose what to read, even if that means, gasp, seeking out a few quality sources, checking in with a few worthwhile people. "influencers" are so ick precisely because they are an gross side-effect of doomscrolling.

markhahn · 2025-12-13T01:13:22 1765588402

why not use "affordances"? it's the correct word, and even though it's low-frequency, wouldn't that pull people into the article?

devinplatt · 2025-12-13T01:51:43 1765590703

The article mentions affordances. I assume the title uses doorknobs because that's a more familiar word as you point out.

markhahn · 2025-12-20T05:18:06 1766207886

no, "doorknob" is merely higher-frequency due to its other meanings. it's never used in this context - probably because it's a terrible affordance (see Norman - push or pull?)

ninalanyon · 2025-12-14T21:02:13 1765746133

Doorknobs is a more commonly used word I'll grant you, but it meant noting to me at all whereas affordances would have.

markhahn · 2025-12-11T14:36:51 1765463811

yes, it's misleading clickbait.

the author's apparent epiphany is realizing that init is just a program. the kernel is, of course, software as well, but it does injustice to both "program" and "kernel" to lump them together.