Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The semantics of LZCNT combined with its encoding feels like an own goal: it’s encoded as a BSR instruction with a legacy-ignored prefix, but for nonzero inputs its return value is the operand size minus the return value of the legacy version. Yes, clz() is a function that exists, but the extra subtraction in its implementation feels like a small cost to pay for extra compatibility when LZCNT could’ve just been BSR with different zero-input semantics.


I'm not following: as long as you are introducing a new, incompatible instruction for leading zero counting, you'd definitely choose LZCNT over BSR as LZCNT has definitely won in retrospect over BSR as the primitive for this use case. BSR is just a historical anomaly which has a zero-input problem for no benefit.

What would be the point of offering a new variation BSR with different input semantics?


When it comes to TZCNT vs BSF, they are just compatible enough for a new compiler to use unconditionally (if we assume that BSF with a zero input leaves its output register unchanged, as it has for decades, and as documented by AMD who defined LZCNT): the instruction sequence

  MOV   ECX, 32
  TZCNT ECX, EAX ; i.e. REP BSF ECX, EAX
behaves identically on everything from the original 80386 up and is better on superscalars with TZCNT support due to avoiding the false dependency on ECX. The reason for that is that BSF with a nonzero input and TZCNT with a nonzero input have exactly the same output. That’s emphatically not true of BSR and LZCNT, so we’re stuck relegating LZCNT to compiler flags.


TZCNT and BSF are not completely identical even for non-zero input: BSF sets the ZF when the input is zero, TZCNT sets ZF when the output is zero (i.e., the least significant bit is set).


Yes, it's like someone looked at TZCNT and thought "let's encode LZCNT the same way", but it makes no sense.


LZCNT and TZCNT are corrections (originally introduced by AMD) for the serious mistake done by the designers of Intel 80386 when they have defined BSF and BSR.

Because on the very slow 80386 the wrong definition for the null input did not matter much, they have failed to foresee how bad it will become for the future pipelined and superscalar CPUs, where having to insert a test for null input can slow down a program many times.

Nevertheless, they should have paid more attention to the earlier use of such instructions. For instance Cray-1 had defined LZCNT in the right way almost ten years earlier.


LZCNT was introduced by Intel in BMI1, while TZCNT was introduced by AMD.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: