oh boy I've got opinions here. Basically I just don't want to hear about "the st...

gajjanag · 2025-11-06T14:38:57 1762439937

>80%-90% or so of real life vectorization can be achieved in C or C++ just by writing code in a way that it can be autovectorized.

Yep. I was pleasantly surprised by the autovectorization quality with recent clang at work a few days ago. If you write code that the compiler can infer to be multiples of 4, 8, etc the compiler goes off and emits pretty decent NEON/AVX code. The rest as you say is handled quite well by intrinsics these days.

Autovectorization was definitely poorer 5-10 years ago on older compiler toolchains.

CryZe · 2025-11-06T12:30:05 1762432205

Keep an eye out for the algebraic operations on floats currently in nightly then: https://doc.rust-lang.org/nightly/std/primitive.f32.html#alg...

the__alchemist · 2025-11-06T14:48:05 1762440485

I stumbled on these recently; you can do these in CUDA kernels. I have some "todo: mul_add here" in my rust code!

TinkersW · 2025-11-06T19:47:56 1762458476

So you have to write fugly code just to get something that should be a compiler switch?

the__alchemist · 2025-11-06T14:46:53 1762440413

Yikes. Sounds like we need this in rust ASAP. (I do a lot of parallizable code; GPU-centric, but CPU-SIMD is a good fallback for machines that don't have nvidia GPUs). I find the manual SIMD packing/unpacking clumsy, especially when managing this in addition to non-SIMD CPU, and GPU code.