x32 is clearly a good technical solution. If you're running on a 64 bit kernel, x32 is superior to i686 in every measurable way.
But it's also yet another architecture. It's not binary compatible with either i686 or x86_64. You need your whole userspace compiled to use it. Middleware with embedded assembly (there's a surprising amount of this in glibc, and of course things like ffmpeg, libjpeg, etc...) needs to be ported. You can't run 32 bit binaries from proprietary sources.
And frankly the benefit over straight x86_64 is quite modest. I don't see x32 taking off. It's just not worth the hassle.
I thought that the work that goes into multiarch support would allow you to run a single kernel and mix&match x32 and x86_64 binaries on the same system, but I might be wrong (Of course that would require a separate set of any lib/dependency).
Some numbers mentioned on the x32abi page hint on anywhere between 4% and 40% performance gains, if that is true then I'd think the benefits would outweigh the hassle of another architecture.
(Edit: Most middleware also ship straight-C versions of the routines; whether or not an x32 C compiler can measure up to handcrafted x86 or x86_64 assembly I don't know - but I'm guessing the much higher register count would help a lot. Regarding proprietary software: There are a great many server configurations that do not need anything beyond the standard open source packages available in Debian)
A standalone x32 binary will run fine on an x64 machine. But if you want to link to any libraries, the library will also have to be x32. So an x64 system, which probably has 32-bit legacy libs as well as normal 64-bit ones, will also need a complete set of x32 libs for x32 to be practical.
Sure. But after the porting work has been done by the distribution vendor, it's done. The package manager software should be able to do whatever is necessary to almost transparently ensure any necessary dependencies are installed for the required sub-architecture. So I would imagine that in most cases, end-users won't notice any hassle except having the option to choose between x32 and x86_64 per package during installation. I think that sounds kind of neat :)
OK, modulo taking up extra space on already-cramped CD distros, taking more time to download updates, taking up more space on production hard disks, and having to download a new version of the software if your dataset grows over 4GB, it sounds good. :)
And extra memory taken up at runtime by having to load the other versions of the libraries. And the I/O costs of reading them in. I'd think that'd outweigh the performance benefits many times over in nearly all cases.
Note that there is no need to port every single program. You can still run a 64 bit userspace (or a 32 bit one for that matter), but for the apps where there is a major benefit use x32. This also means you don't have to port every library etc.
I wonder if there could be an alternative in hardware.
Could the cpu have a mode/flag where all pointers in cpu registers were treated as 32 bits (high 32 bits ignored)?
Alignment issues might make this impossible (loading a 32 bit address in a compatible way from memory/cache...) but it would be neat if it could work. For instance, if memory layout in 32 bit chunks were A... B..., there would have to be a way to load both A and B into the low 32 bits of a register.
I am not sure this is true generally: the JVM has been able to use "Compressed OOPS" aka 32bit pointers to objects managed by the VM in 64 bit platforms for a few years, even without any support from the OS.
Off course it's better when language/VM implementer take this issue into account, but for those who doesn't, just building a VM/interpreter for x32 architecture can be the only workaround.
But it's also yet another architecture. It's not binary compatible with either i686 or x86_64. You need your whole userspace compiled to use it. Middleware with embedded assembly (there's a surprising amount of this in glibc, and of course things like ffmpeg, libjpeg, etc...) needs to be ported. You can't run 32 bit binaries from proprietary sources.
And frankly the benefit over straight x86_64 is quite modest. I don't see x32 taking off. It's just not worth the hassle.