The conversion dex2oat that is performed when an application is installed on an Android device generates machine code that’s very specific to its CPU. High level optimization is performed in previous passes. AFAIK this approach is successful.
...but those CPUs are still speculative out-of-order super-scalars aren't they?
We're talking about removing those features, on which our entire computing ecosystem is built, and expecting the compiler to be able to pipeline every execution unit individually.
dex2oat is where the work could be done yes, but we just don't appear to know as a field how to fill in processor pipelines like that - we just don't have that knowledge to do it, and nobody seems to be able to figure it out despite trying several times.
>...but those CPUs are still speculative out-of-order super-scalars aren't they?
Not universally, even in current-generation devices e.g. the Cortex A53 and A55 are in-order and were explicitly mentioned as safe by ARM.
Snapdragon 625/626 is an octa-core Cortex A53 at 2.2GHz in a lot of the current mid-range devices from almost every major phone manufacturer (Xiaomi, Samsung, Moto, Huawei, Asus, Lenovo, ...)