I'm not following: as long as you are introducing a new, incompatible instructio...

mananaysiempre · on July 11, 2024

When it comes to TZCNT vs BSF, they are just compatible enough for a new compiler to use unconditionally (if we assume that BSF with a zero input leaves its output register unchanged, as it has for decades, and as documented by AMD who defined LZCNT): the instruction sequence

  MOV   ECX, 32
  TZCNT ECX, EAX ; i.e. REP BSF ECX, EAX

behaves identically on everything from the original 80386 up and is better on superscalars with TZCNT support due to avoiding the false dependency on ECX. The reason for that is that BSF with a nonzero input and TZCNT with a nonzero input have exactly the same output. That’s emphatically not true of BSR and LZCNT, so we’re stuck relegating LZCNT to compiler flags.

aengelke · on July 11, 2024

TZCNT and BSF are not completely identical even for non-zero input: BSF sets the ZF when the input is zero, TZCNT sets ZF when the output is zero (i.e., the least significant bit is set).