Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Others have covered it here, using masks, multiplies, or horizontal operations to avoid branching by running the logic for all of the branches is common in vectorized code (and shader operations for similar reasons), where raw compute outruns branch mis-prediction costs for deeply pipelined processors.

In older SSE architectures, memory alignment and register-to-register result latencies sometimes made it easier/better to stride operations on partially-masked registers rather than reorganize the data. With high architectural register counts and newer shuffling instructions, that need is lower.

Definitely avoid branching in tight compute code, though. Branching kills performance, sometimes in very data-dependent and annoying ways.



“Avoid branches” isn’t strictly true in scalar code (and sometimes vector code) - zero folding leads to longer dependency chains, so if your branch is very predictable and doesn’t preclude other optimizations (I.e. vectorizing) the branch can be faster.

You mileage may vary, depending on how much reordering capacity the hardware has, how good the branch predictor is, how expensive a mispredict is, whether the extra dependency chains are impactful, etc.


To add to your last point: look at the optimised assembly and see if your code actually does contain branches. The compiler can see through certain if statements and replace them with cmov instructions. Replacing that with a complex sequence of arithmetic to get rid of the 'if' might make performance worse rather than better.


No rule is strictly true in optimization, except the rule that no rule is strictly true in optimization...

It's quite often the case that branches are bad for performance in tight loops, so It's better to treat branches as likely sources of trouble than as no big deal.


I agree. I'm just saying, what is and is not a branch is not obvious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: