Maybe the CPU is the wrong place to perform sophisticated optimizations. IMHO, the CPU should be far more simple and the intelligence should be in the compiler.
This has been tried several times and it just doesn't look like it works in practice. Delay slots, Itanium, etc.
The CPU is a bit like a JIT in that it can see how the program is really running and optimise for those conditions, which the AOT compiler cannot. Your AOT compiler may not know you're going to take a branch more times than not, but your CPU may be able to work that out at runtime. And then tomorrow you may never take the same branch and it'll work that out as well for the same code.
The compiler only knows about the program and maybe statistical information about the data.
The CPU knows about the actual data currently being processed.
Therefore, the CPU can do more by using branch prediction and speculative execution. It is more expensive in terms of energy per computation but so far it was worth it. The CPU can also optimize old code on-the-fly.
As far as I remember, both. It was only slowly executing x86 code and the compilers didn't deliver what was expected. It was a gamble on non-matured technology. They didn't fail completely, though, HP was selling systems using Itanium processors for a long time.
The conversion dex2oat that is performed when an application is installed on an Android device generates machine code that’s very specific to its CPU. High level optimization is performed in previous passes. AFAIK this approach is successful.
...but those CPUs are still speculative out-of-order super-scalars aren't they?
We're talking about removing those features, on which our entire computing ecosystem is built, and expecting the compiler to be able to pipeline every execution unit individually.
dex2oat is where the work could be done yes, but we just don't appear to know as a field how to fill in processor pipelines like that - we just don't have that knowledge to do it, and nobody seems to be able to figure it out despite trying several times.
>...but those CPUs are still speculative out-of-order super-scalars aren't they?
Not universally, even in current-generation devices e.g. the Cortex A53 and A55 are in-order and were explicitly mentioned as safe by ARM.
Snapdragon 625/626 is an octa-core Cortex A53 at 2.2GHz in a lot of the current mid-range devices from almost every major phone manufacturer (Xiaomi, Samsung, Moto, Huawei, Asus, Lenovo, ...)
There is a third way: have the CPU JIT the code to a very different architecture according to completely programmable firmware, just like Transmeta did.
I don't know whether the Transmeta CPUs are vulnerable to Spectre and Meltdown, but fixes to both would be one firmware update away - and most probably with little to no performance impact.