Others have covered it here, using masks, multiplies, or horizontal operations t...

vgatherps · on Jan 24, 2023

“Avoid branches” isn’t strictly true in scalar code (and sometimes vector code) - zero folding leads to longer dependency chains, so if your branch is very predictable and doesn’t preclude other optimizations (I.e. vectorizing) the branch can be faster.

You mileage may vary, depending on how much reordering capacity the hardware has, how good the branch predictor is, how expensive a mispredict is, whether the extra dependency chains are impactful, etc.

mort96 · on Jan 24, 2023

To add to your last point: look at the optimised assembly and see if your code actually does contain branches. The compiler can see through certain if statements and replace them with cmov instructions. Replacing that with a complex sequence of arithmetic to get rid of the 'if' might make performance worse rather than better.

chaboud · on Jan 27, 2023

No rule is strictly true in optimization, except the rule that no rule is strictly true in optimization...

It's quite often the case that branches are bad for performance in tight loops, so It's better to treat branches as likely sources of trouble than as no big deal.

mort96 · on Jan 29, 2023

I agree. I'm just saying, what is and is not a branch is not obvious.