The semantics of LZCNT combined with its encoding feels like an own goal: it’s e...

BeeOnRope · on July 11, 2024

I'm not following: as long as you are introducing a new, incompatible instruction for leading zero counting, you'd definitely choose LZCNT over BSR as LZCNT has definitely won in retrospect over BSR as the primitive for this use case. BSR is just a historical anomaly which has a zero-input problem for no benefit.

What would be the point of offering a new variation BSR with different input semantics?

mananaysiempre · on July 11, 2024

When it comes to TZCNT vs BSF, they are just compatible enough for a new compiler to use unconditionally (if we assume that BSF with a zero input leaves its output register unchanged, as it has for decades, and as documented by AMD who defined LZCNT): the instruction sequence

  MOV   ECX, 32
  TZCNT ECX, EAX ; i.e. REP BSF ECX, EAX

behaves identically on everything from the original 80386 up and is better on superscalars with TZCNT support due to avoiding the false dependency on ECX. The reason for that is that BSF with a nonzero input and TZCNT with a nonzero input have exactly the same output. That’s emphatically not true of BSR and LZCNT, so we’re stuck relegating LZCNT to compiler flags.

aengelke · on July 11, 2024

TZCNT and BSF are not completely identical even for non-zero input: BSF sets the ZF when the input is zero, TZCNT sets ZF when the output is zero (i.e., the least significant bit is set).

bonzini · on July 10, 2024

Yes, it's like someone looked at TZCNT and thought "let's encode LZCNT the same way", but it makes no sense.

adrian_b · on July 11, 2024

LZCNT and TZCNT are corrections (originally introduced by AMD) for the serious mistake done by the designers of Intel 80386 when they have defined BSF and BSR.

Because on the very slow 80386 the wrong definition for the null input did not matter much, they have failed to foresee how bad it will become for the future pipelined and superscalar CPUs, where having to insert a test for null input can slow down a program many times.

Nevertheless, they should have paid more attention to the earlier use of such instructions. For instance Cray-1 had defined LZCNT in the right way almost ten years earlier.

bonzini · on July 11, 2024

LZCNT was introduced by Intel in BMI1, while TZCNT was introduced by AMD.