tl;dr GPU drivers made by various vendors do not sanitize compute unit hardware ...

bee_rider · on Jan 16, 2024

These designs all come out of the consumer and gaming space. They correctly trade security away for performance. Blame whoever started running untrustworthy code on them.

These vulnerabilities will continue happening. What I don’t understand is how anybody can be surprised at this point. If anyone out there missed the first dozen instances of this: workloads on modern hardware can’t be isolated.

pjmlp · on Jan 17, 2024

They incorrectly trade security for performance, in the days of money transactions in games, esports, and server dependencies.

bee_rider · on Jan 17, 2024

I built a trampoline out of laptops, but I hurt my knee, I don’t know why Dell insists on making these dangerously sharp and hard trampoline parts.

pjmlp · on Jan 17, 2024

It isn't really the same, but no worries, they will forced to adapt to the upcoming cybersegurity laws.

Just like someone can sue the maker of a trampoline made out of laptops when they cut themselves on them.

bee_rider · on Jan 17, 2024

It depends on how the laws are written. Legislators are often wrong about tech.

But if you are saying Amazon and other cloud providers should be sued, I agree, companies that use parts not fit for the application they are going for should be sued out of business.

That said, the existence of the parts isn’t a problem and I hope manufacturers keep making high performance parts for those of us who don’t use them for crazy and inappropriate things. Manufacturers could be slapped on the wrist for selling their consumer chips as server chips, but I think this is less of a problem because anyone who falls for it is already a walking catastrophe.

kridsdale1 · on Jan 17, 2024

Yeah it’s pretty insane that the entire AI hype and crypto hype industry waves are actually just getting really specific pixel shaders to sparkle in extremely specific ways.

The gear was made to frag noobs in Counterstrike at 300 FPS.

vlovich123 · on Jan 16, 2024

And Imagination.

Notably Intel and Nvidia were not impacted. I wonder if the security hardening that Google worked on with Nvidia for Stadia helped prevent this

Veserv · on Jan 16, 2024

Jeez, I hope it did not require “security hardening” for Nvidia to do something this basic. If these other vendors missed some tiny corner resulting in state leakage, that would be understandable. But, forgetting to clear local memory is just inexcusable.

Imagine a OS forgetting to replace your general purpose registers across context switches. Only a rank incompetent and useless security process would let something like that get all the way through to deployment.

vlovich123 · on Jan 16, 2024

Vendors have consistently ignored multi tenant issues when coding because gaming doesn’t need it and cloud traditionally hasn’t used GPUs all that much.

You’d be surprised by how many security issues exist in GPU drivers

ks6g10 · on Jan 16, 2024

Could probably be that the shared memory (at least in the past) also was used for cache, so the same mechanism that probably sanitizer the cache is/was in play here.

jmgao · on Jan 16, 2024

Google used AMD GPUs for Stadia, not NVIDIA.

vlovich123 · on Jan 16, 2024

Eventually for the product. My memory may be faulty but I talked with engineers working on it during development and I’m pretty sure the initial development was on Nvidia.

kg · on Jan 16, 2024

Could imagine it being related to CUDA as well. Memory being consistently zero-initialized helps prevent application bugs.

vlovich123 · on Jan 16, 2024

Maybe but multi tenant GPU use cases only really come up for cloud and cloud GPU popularity is only a little more recent.

kg · on Jan 16, 2024

Wasn't too long ago that stray texture data would be left lying around from other processes, too. You could exploit that to read the contents of the user's banking tabs and stuff like that.

dist-epoch · on Jan 16, 2024

It's not so simple.

For example Windows takes over GPU memory control, it virtualizes it, and allocates it to various applications, zeros it, etc...

ks6g10 · on Jan 16, 2024

This is specific to local/scratch memory which is not exposed for allocation in the same way dram is.

throwitaway222 · on Jan 17, 2024

Man so this has been an issue for 20 years, we suddenly invent LLMs and you want your money back on Nvidia.