Monday, 26 October 2015

Common bit prefix length for two integers

Here's a neat trick I discovered a couple of months back: given two signed or unsigned integers of the same bit width, you can calculate the length of their common prefix very efficiently:

  int common_prefix_length(int a, int b)
    return __builtin_clz(a ^ b);

What's this doing? Let's break it down:

a ^ b is the bitwise-xor of a and b. The boolean xor operation is true when one of it's inputs is true and the other is false; or false if both have the same value. Another way to put this is that xor returns true when its inputs are different and false if they're the same. The bitwise-xor operation then, returns a value which has zeros for every bit that is the same in both a and b; and ones for every bit that's different.

__builtin_clz is a GCC intrisinc function which counts the number of leading zero bits of its argument. It compiles down to a single machine code instruction on hardware that supports it (which includes every Intel chip made in this decade). The Visual C++ equivalent is the _BitScanReverse intrinsic, which has a slightly more complicated API; implementing the above with it is left as an exercise for the reader. :-)

Passing the result of a ^ b to __builtin_clz means we're counting the leading zero bits in a number where a zero bit indicates that the corresponding bits in a and b had the same value; which is exactly how we get the length of the common prefix.

You can get the common suffix in the same way. The only difference is that you use the __builtin_ctz intrinsic (Visual C++: _BitScanForward) instead, to count the trailing zero bits:

  int common_suffix_length(int a, int b)
    return __builtin_ctz(a ^ b);

Neat huh?

No comments:

Post a Comment