Shader languages are also hellbent on avoiding branches too so if is frowned upon and often not used. I could easily imagine not having it in a shader language.
The old assembly-like languages (ARB_fragment_program, NV_fragment_program*, et al.) did indeed not have branches, only selection and conditional termination, because that was the extent of the capabilities of the underlying hardware. (I understand the execution on modern fragment processors can’t actually diverge within a single batch, either, so they execute both branches and select afterwards, but they are at least capable enough not to do that if the branch went the same way everywhere. But it’s been a long time since I’ve had a state-of-the-art GPU to play with.)
Still true. Cuda warps work by team of 32 threads and if there is a branch they have to take both and then select the result. It's fine for loop termination ``while (i < 1000)`` but if there is actual work it's often significantly better to switch to branchless code.
This is very much changing. IMHO, doing shader language design today, you should give let the programmer express things in the most natural way possible, and let the compiler figure out whether to generate a branch or branchless code. Yes, often you want the latter, but compilers are pretty good at figuring that out.