EnPepeGamer: January 2017

Counting bits is a very important task in many problems, like data compression. In a computer all the numeric values are stored in binary format. For example, if we want to encode the number 212 in a 32 bits register it will look like this:

0000 0000 0000 0000 0000 0000 1101 0100

The operation that counts the number of bits set to 1 in a register is called bitcount() or popcount(). We can also employ the definition of the Hamming weight, which is the number of elements in a vector that are different to the 0 symbol. In our example:

bitcount(212) = 4

The implementation of this operation depends on the architecture of the CPU in which we run our program. Sometimes, the CPU will have a instruction implementing the bitcount() at hardware level. The instructions POPCNT and LZCNT are part of the ABM (Advanced Bit Manipulation) set. These instructions are included in the x86 architecture in the SSE 4.2 instruction set.

If we have access to a hardware instruction this will be our best option, because we will obtain the result in few execution cycles. If we are programming in a high level language we must check the necessary steps to use the hardware instruction (normally we must call an specific function and compile the program with an special flag).

Nevertheless, there are situations in which we do not have access to a hardware instruction. For such cases there is a good general solution that employs bitwise operations (before going forward you should have knowledge of hexadecimal numeric representations). The following code applies for 32 bits registers:

inline uint32_t popcount(uint32_t i) {
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24;
}

This algorithm is known as HAKMEM, developed by Beeler, Gosper and Schroppel in 1972. As you can see the code is a bit complex, so you can add a comment stating "//Just works" or "//Magic follows". Another option is to keep reading a bit further, so you can blame me later if I do not explain it properly.

HAKMEM employs a divide and conquer approach to count the bits in parallel. Let's see how it works by following the example with the number 212.

1) Shift the initial register one bit to the right i >> 1:

0000 0000 0000 0000 0000 0000 1101 0100
----------------------------------------
0000 0000 0000 0000 0000 0000 0110 1010

2) Apply the AND function with the mask 0x55555555 ( i >> 1 ) & 0x55555555:

0000 0000 0000 0000 0000 0000 0110 1010

0101 0101 0101 0101 0101 0101 0101 0101
----------------------------------------
0000 0000 0000 0000 0000 0000 0100 0000

3) Subtract the result to the original value i - ( ( i >> 1 ) & 0x55555555 ):

0000 0000 0000 0000 0000 0000 1101 0100

0000 0000 0000 0000 0000 0000 0100 0000
----------------------------------------
0000 0000 0000 0000 0000 0000 1001 0100

With this three initial steps we have obtained the total number of bits set to one in every pair of bits of the original register. If we compare the last 8 bits of the original register with our current result:

11 01 01 00

10 01 01 00

We can see that the pair 11 has 2 bits set to one (10). It is followed by to pairs 01 with 1 bit set to one (01) and a pair 00 without any bit set (00).

The next operation adds these values in pairs, obtaining the number of bits set to one in each group of 4 bits, let's see how:

4) First we apply the bit mask 0x33333333 to our last value i & 0x33333333:

0000 0000 0000 0000 0000 0000 1001 0100

0011 0011 0011 0011 0011 0011 0011 0011

----------------------------------------

0000 0000 0000 0000 0000 0000 0001 0000

5) Then we shift the last value two bits to the right and apply the mask (i >> 2) & 0x33333333:

0000 0000 0000 0000 0000 0000 1001 0100
----------------------------------------
0000 0000 0000 0000 0000 0000 0010 0101

0011 0011 0011 0011 0011 0011 0011 0011

----------------------------------------

0000 0000 0000 0000 0000 0000 0010 0001

6) We add the two values we obtained:

0000 0000 0000 0000 0000 0000 0001 0000

0000 0000 0000 0000 0000 0000 0010 0001

----------------------------------------

0000 0000 0000 0000 0000 0000 0011 0001

Now we have the total number of ones in each group of four bits. If we compare the original value of the last 8 bits of 112 with our result:

1101 0100

0011 0001

We can see that the first group 1101 has 3 bits set to one (0011) and that the second group 0100 has only a bit set to one (0001).

The next steps obtain the total number of ones in each group of 8 bits (a byte).

7) We add i + (i >> 4):

0000 0000 0000 0000 0000 0000 0011 0001

0000 0000 0000 0000 0000 0000 0000 0011

----------------------------------------

0000 0000 0000 0000 0000 0000 0011 0100

8) We apply the mask of bits 0xF0F0F0F (i + (i >> 4)) & 0xF0F0F0F:

0000 0000 0000 0000 0000 0000 0011 0100

0000 1111 0000 1111 0000 1111 0000 1111

----------------------------------------

0000 0000 0000 0000 0000 0000 0000 0100

If we compare the original value of the last byte of our original register:

11010100
00000100

We see that it is correct, 11010100 contains 4 bits (00000100).

Right now you may already understand how the algorithm works. For our example we already finished, because it only uses the 8 least significant bits. However, if you have managed to read until here there is a surprise. There is a mathematical property that can be used to calculate the bitcount of the 32 bit register in one step.

Multiplying by 0x1010101 has a special property. If the number that we multiply has four bytes (A, B, C, D), then the result of the multiplication will return a result whose 4 bytes will be (A+B+C+D, B+C+D, C+D, D). The first byte of the result contains the bit count of the 32 bit register. Now we only have to shift this result 24 bits to the right >> 24 to obtain the result.

We finished! I hope you enjoyed this as much as me preparing it. Comments, questions, knifes and shurikens ... are very welcome.

Some references:

Stack overflow

University of Standford page

Hackers delight, chapter 5

Bitwise operations

Related entries:

Inverse square root rerevisited

Atom is a modern text editor developed in Javascript by the people responsible of GitHub. In their blog you can find many tips and tricks to enhance it. I have tried this editor and works really nicely. It has a good integration with GitHub. They have done a really good promotional video with vintage aesthetics. You should have a look:

One of the best things about this editor is that it has many plug-ins that will help us to develop in Rust, the new systems language developed by Mozilla. And it works really well!!! If we set these plug-ins properly we will have code auto-completion and error highlighting. I have made a guide to help you setting up everything. This is the list of plug-ins I have installed so far and the commands needed:

language-rust - Rust language support in Atom.

apm install language-rust

linter-rust - Lint rust files, using rustc and cargo. Checks for errors in the code

apm install linter

apm install linter-rust

racer - Intelligent rust code completion. This one is bit more tricky, you need also the racer binary:

Check if language-rust plug-in is installed

Follow the steps on github to install racer or use the binary provided by mozilla

apm install racer

Download the rust source code and extract it

Configure racer from atom: Set the paths to the racer executable and the Rust source code

rust-api-docs-helper - Allows to check the Rust standard library automatically

apm install rust-api-doc-helper

You get the idea. It is very easy!!! Once installed the plug-ins can be updated automatically.

I hope you like this entry. Please post your comments with your experiences and share this post with your friends. Have you tried other alternatives to develop in Rust?

EnPepeGamer

Tabs

Monday 9 January 2017

Counting bits

Thursday 5 January 2017

The Atom text editor and Rust

Hello everyone

Anxiety and Starcraft II