I hope to see dedicated GPU coprocessors disappear sooner rather than later, jus...

wtallis · 2025-06-06T16:13:32 1749226412

Arithmetic co-processors didn't disappear so much as they moved onto the main CPU die. There were performance advantages to having the FPU on the CPU, and there were no longer significant cost advantages to having the FPU be separate and optional.

For GPUs today and in the foreseeable future, there are still good reasons for them to remain discrete, in some market segments. Low-power laptops have already moved entirely to integrated GPUs, and entry-level gaming laptops are moving in that direction. Desktops have widely varying GPU needs ranging from the minimal iGPUs that all desktop CPUs now already have, up to GPUs that dwarf the CPU in die and package size and power budget. Servers have needs ranging from one to several GPUs per CPU. There's no one right answer for how much GPU to integrate with the CPU.

otabdeveloper4 · 2025-06-06T18:24:40 1749234280

By "GPU" they probably mean "matrix multiplication coprocessor for AI tasks", not actually a graphics processor.

wtallis · 2025-06-06T19:16:56 1749237416

That doesn't really change anything. The use cases for a GPU in any given market segment don't change depending on whether you call it a GPU.

And for low-power consumer devices like laptops, "matrix multiplication coprocessor for AI tasks" is at least as likely to mean NPU as GPU, and NPUs are always integrated rather than discrete.

otabdeveloper4 · 2025-06-07T15:07:13 1749308833

Yes it does change something.

A GPU needs to run $GAME from $CURRENT_YEAR at 60 fps despite the ten million SLoC of shit code and legacy cruft in $GAME. That's where the huge expense for the GPU manufacturer lies.

Matrix multiplication is a solved probelm and we need to implement it just once in hardware. At some point matrix multiplication will be ubiquitous like floating-point is now.

wtallis · 2025-06-07T23:28:35 1749338915

You're completely ignoring that there are several distinct market segments that want hardware to do AI/ML. Matrix multiplication is not something you can implement in hardware just once.

NVIDIA's biggest weakness right now is that none of their GPUs are appropriate for any system with a lower power budget than a gaming laptop. There's a whole ecosystem of NPUs in phone and laptop SoCs targeting different tradeoffs in size, cost, and power than any of NVIDIA's offerings. These accelerators represent the biggest threat NVIDIA's CUDA monopoly has ever faced. The only response NVIDIA has at the moment is to start working with MediaTek to build laptop chips with NVIDIA GPU IP and start competing against pretty much the entire PC ecosystem.

At the same time, all the various low-power NPU architectures have differing limitations owing to their diverse histories, and approximately none of them currently shipping were designed from the beginning with LLMs in mind. On the timescale of hardware design cycles, AI is still a moving target.

So far, every laptop or phone SoC that has shipped with both an NPU and a GPU has demonstrated that there are some AI workloads where the NPU offers drastically better power efficiency. Putting a small-enough NVIDIA GPU IP block onto a laptop or phone SoC probably won't be able to break that trend.

In the datacenter space, there are also tradeoffs that mean you can't make a one-size-fits-all chip that's optimal for both training and inference.

In the face of all the above complexity, the question of whether a GPU-like architecture retains any actual graphics-specific hardware is a silly question. NVIDIA and AMD have both demonstrated that they can easily delete that stuff from their architectures to get more TFLOPs for general compute workloads using the same amount of silicon.

touisteur · 2025-06-06T19:35:07 1749238507

Wondering how you'd classify Gaudi, tenstorrent-stuff, groq, or lightmatter's photonic thing.

Calling something a GPU tends to make people ask for (good, performant) support for opengl, Vulkan, direct3d... which seem like a huge waste of effort if you want to be an "AI-coprocessor".

wtallis · 2025-06-06T19:57:33 1749239853

> Wondering how you'd classify Gaudi, tenstorrent-stuff, groq, or lightmatter's photonic thing.

Completely irrelevant to consumer hardware, in basically the same way as NVIDIA's Hopper (a data center GPU that doesn't do graphics). They're ML accelerators that for the foreseeable future will mostly remain discrete components and not be integrated onto Xeon/EPYC server CPUs. We've seen a handful of products where a small amount of CPU gets grafted onto a large GPU/accelerator to remove the need for a separate host CPU, but that's definitely not on track to kill off discrete accelerators in the datacenter space.

> Calling something a GPU tends to make people ask for (good, performant) support for opengl, Vulkan, direct3d... which seem like a huge waste of effort if you want to be an "AI-coprocessor".

This is not a problem outside the consumer hardware market.

otabdeveloper4 · 2025-06-07T15:21:10 1749309670

Consumer hardware and AI inference are joined at the hip right now due to perverse historical reasons.

AI inference's big bottleneck right now is RAM and memory bandwidth, not so much compute per se.

If we redid AI inference from scratch without consumer gaming considerations then it probably wouldn't be a coprocessor at all.

saltcured · 2025-06-06T18:34:53 1749234893

Aspects of this has been happening for a long time, as SIMD extensions and as multi-core packaging.

But, there is much more to discrete GPUs than vector instructions or parallel cores. It's very different memory and cache systems with very different synchronization tradeoffs. It's like an embedded computer hanging off your PCI bus, and this computer does not have the same stable architecture as your general purpose CPU running the host OS.

In some ways, the whole modern graphics stack is a sort of integration and commoditization of the supercomputers of decades ago. What used to be special vector machines and clusters full of regular CPUs and RAM has moved into massive chips.

But as other posters said, there is still a lot more abstraction in the graphics/numeric programming models and a lot more compiler and runtime tools to hide the platform. Unless one of these hidden platforms "wins" in the market, it's hard for me to imagine general purpose OS and apps being able to handle the massive differences between particular GPU systems.

It would easily be like prior decades where multicore wasn't taking off because most apps couldn't really use it. Or where special things like the "cell processor" in the playstation required very dedicated development to use effectively. The heterogeneity of system architectures makes it hard for general purpose reuse and hard to "port" software that wasn't written with the platform in mind.

rjsw · 2025-06-06T16:27:15 1749227235

That was one of the ideas behind Larrabee [1]. You can run Mesa on the CPU today using the llvmpipe backend.

https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)