« Code Kata One - Supermarket Pricing | Main | Code Kata Fifteen -- A Diversion »

January 28, 2007

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c41c69e200d8353d4a3153ef

Listed below are links to weblogs that reference Code Kata--How It Started:

Comments

karl

I like your idea of code Kata. It is something I been doing a lot of lately.

Your bit counting code problem gets used a lot in interview questions. Also it got discussed in Jon Bentley's great book "Programing Perls".

Anyway there is way to get a 10x speed up from your suggested solution. I will put the answer up tomorrow. Hint: think about using pregenerated tables.

karl

So one way is treat your very large bitfield as an array of bytes. A btye only has 256 possible values. You could make a table called count with 256 values. Each count element is the number of high bits of its index.

count[0] = 0; // 0000
count[1] = 1; // 0001
count[2] = 1; // 0010
count[3] = 2; // 0011
...
count[255] = 8;

You pregenerate the count table.


Then your function can be

int CountHigh( byte[] bitset )
{
int sum = 0;
for( i=0; i sum += count[ bitset[i] ];

return sum;

}

There are few ways to make loop go faster but it mostly depends on the implementing language. One option is to treat is the bitset as an array of word. But then Count has to have 64K elements.

Karl


Erik Søe Sørensen

Given that the bit vector are very sparse, and that the common "x=0" special case has already been taken care of, the gain from using a lookup table would probably not be worth the amount of code needed for specifying the lookup table and/or the code for generating it.

However, special casing the "1-bit-set" case would be simple:
...
next if x.zero?
{count+=1; next} if (x & (x-1)).zero?
....
(I hope the syntax is right).
This may yield another speed benefit; I haven't done any measurements (which is again the point of the article, I know...). Yet.

Erik Søe Sørensen

Performance numbers - for Perl:

If the baseline is Hacker's Delight counting method, with 0-bit check:
Adding the 1-bit special case gives 9% better performance for 10% bit saturation,
a 12% time decrease for 20% saturation,
and 4% for 30% bit saturation.

You can even use the "clear the lowest bit" trick exclusively, like this:
while x>0: {count+=1; x = x & (x-1);}
This is substantially faster (15%) than both the naïve and the HD counting method, even with zero-checking.

I'd be interested in hearing numbers for other languages, of course.
(My guess is that in C or similar, it might pay of to not branch too much - so HD with zero-checking would probably be the winner.)

Erik Søe Sørensen

Sorry, the test setup used above wasn't right... for one thing, the timings included a vector 'and'; for another, the bit densities were accordingly sparse.

New and better numbers:

Bit density | A->B | A->Z | H->J | H->Z | T->Z // Time reduction
5% | 38% | 87% | 23% | 32% | 28%
10% | 13% | 83% | 2% | 16% | -2%
25% | -2% | 72% | -13% | -48% | -94%
50% | -3% | 55% | -17% |-144% | -212%

where
A=naive counting w/ zero case
B=A + 1-bit special case
H=Hacker's Delight bit counting w/ zero case
J=H + 1-bit special case
T=Table version (256 entry table)
Z=counting by repeatedly clearing the lowest bit

(sorry about the formatting. Seems the blog system doesn't support anything as fancy as pre-formatted text.)

Conclusion: the code-wise simple (but somewhat sophisticated) 'Z' algorithm is consistently far better than the naïve 'A'. Perhaps more surprisingly, it even beats 'H' on low bit densities.
But to beat the table-driven algorithm, it needs bit densities below 10%.

Bill K

If it comes back almost instantly anyway, why is your focus speed and not readability?

It seems like practicing for speed is almost counter productive because you will be even more adapt at (and likely to use) cryptic patterns that trade an imperceptible amount of runtime speed for some serous debug time the next time a new player hits your code...

Jolie

Thank you for this article. I've found something similar by means of rapidshare search engine ( http://rapidqueen.com ), but they didn't help me much. Your article was of more help to me )

download search

Usefule thoughts, have found some of the things I did not know about dealing with chunking the Bignum into 30-bit words. Thanks a lot for sharing.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment