Yes, this library is of uneven quality, and would benefit from a few hours of focused attention from a specialist. E.g. the various HighestBit functions below what you've called out also look inefficient relative to something using one of the builtin_clz intrinsics, even though there's an earlier use of builtin_clz...
I think a lot of times the way a header like this comes into being is that someone has a narrow need for an operation that is deemed reusable and abstract. The 64 bit version of a 32 bit bit utility doesn't have a need at the time. Then somebody comes in later trying to fill out more stuff.
On many platforms a shift by more than 31 bits often did nothing, despite it should do something. Intel, for example, masked the count at 5 bits, breaking a lot of 64 bit compiler hacks.
There is also the issue of cache and speed - expanding to full size would have different footprints, so perhaps they tested that too.
This is likely a result to work around those problems.