When people really need something that's reliable, there's really no limit to how much effort can be put into producing a system with unfailing integrity and availability.
Take, for example, the lockstep facility on certain IBM processors:
You can take two or more of them, run identical software on them, and compare their output on a cycle-by-cycle basis.
Now, the 750GX series may be a bit out of date in the modern era ... but good luck achieving that level of paranoid system integrity with just about any truly modern system.
One thing that I think they don't teach so well in most colleges is that a system's compute performance is not always the most important measurement of the system's capability.
Take, for example, the lockstep facility on certain IBM processors:
https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/29...
You can take two or more of them, run identical software on them, and compare their output on a cycle-by-cycle basis.
Now, the 750GX series may be a bit out of date in the modern era ... but good luck achieving that level of paranoid system integrity with just about any truly modern system.
One thing that I think they don't teach so well in most colleges is that a system's compute performance is not always the most important measurement of the system's capability.