Well it turned out that for running scalar code with branches and stack frames exposing too much to the compiler was not helpful, especially as transistor budgets increased. So as long as we program with usual functions and conditionals this is what we have.
I can swap out a cpu for one with better IPC and hardware scheduling in 10 minutes but re-installing binaries, runtime libraries, drivers, firmware to get newly optimized code -- no way. GPU drivers do this a bit and it's no fun.
I can swap out a cpu for one with better IPC and hardware scheduling in 10 minutes but re-installing binaries, runtime libraries, drivers, firmware to get newly optimized code -- no way. GPU drivers do this a bit and it's no fun.