Ruby: count the number of 1's in a binary number
I have a binary number (52 bits) represented as a string "01100011...."
What would be the quickest way to count the number of 1's?
"01100011....".count("1")
obviously works but is quite time consuming if this operation needs to be done thousands of times.
ok, some more info. I am trying to create bit vectors for words as follows
def bit_vec(str)
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
bv = ""
alphabet.each_char do |a|
if str.include?(a)
bv += "1"
else
bv += "0"
end
end
bv
end
The bit_vec method gets called about 170K times. I store the bit vectors in a hash and use them to find similar words for a given word by XOR'ing the bit vectors and counting the number of 1's (more 1's == less similarity). If the count method does not use String#scan the what else could use it?
开发者_运维知识库I know Ruby is slower than say C or Java. I am just looking to improve the algorithm the best I can. I am not looking for raw speed.
Maybe the include? method is the bottleneck?
Note that the problem of counting 1-bits is referred to as a "population count".
At least in Ruby, stick with handling these as a string via the count
method unless you have a compelling reason to use integers.
count
:
Benchmark: 78.60s for 10,000,000 iterations (127,225.63 iterations per second)
For integer math,
If you don't care about values above 2**32
,
def popcount(x)
m1 = 0x55555555
m2 = 0x33333333
m4 = 0x0f0f0f0f
x -= (x >> 1) & m1
x = (x & m2) + ((x >> 2) & m2)
x = (x + (x >> 4)) & m4
x += x >> 8
return (x + (x >> 16)) & 0x3f
end
Benchmark: 105.73s for 10,000,000 iterations (94,579.03 iterations per second)
If you do care about values above 2**32
,
def popcount(x)
b = 0
while x > 0
x &= x - 1
b += 1
end
return b
end
Benchmark: 365.59s for 10,000,000 iterations (27,353.27 iterations per second)
Addenda:
Your code:
Benchmark: 78.25s for 1,000,000 iterations (12,779.56 iterations per second)
This code:
def bit_vec(str)
# alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
bv = "0" * 52
str.each_char do |c|
ord = c.ord
next unless (ord >= 65 && ord <= 90) || (ord >= 97 && ord <= 122)
index = ord - 65
index -= 6 if index > 25
bv[index] = "1"
break if bv == "1111111111111111111111111111111111111111111111111111"
end
bv
end
Note: You said that you were dealing with a 52-bit number, so I assumed that you cared about both upper and lower case letters (26 + 26 = 52). I opted to check for uppercase first because that's how they appear in pretty much every character set ever, making the calculations a little easier.
Benchmark: 24.86s for 1,000,000 iterations (40,231.60 iterations per second)
3.14x speed-up.
You are going to have O(n)
performance, no matter what. Try this simple ruby command. Measure if it's really a problem.
This simple script, measured with time ruby test.rb
took 0.058 CPU seconds. This is on an old 1.25 Ghz processor. Are you really sure this operation is too slow?
10000.times do
"0100010000100001111101101000111110000001101001010".count("1")
end
If that isn't fast enough write a C extension. Try to avoid using conditionals. Write it like this:
count = 0;
for (i = stringLength; i; i++) {
count += string[i] - '0'; // no conditional used.
}
But honestly, if you need that kind of speed ruby is the wrong language for you. There are so many different things in ruby that take much more time than a simple .count("1")
.
from http://www.bergek.com/2009/03/11/count-number-of-bits-in-a-ruby-integer/
yourString.scan(/1/).size
from http://snippets.dzone.com/posts/show/4233
count = 0
count += byte & 1 and byte >>= 1 until byte == 0
Here is a post with different loops (in c) for counting based on the density of 0's vs. 1's
http://gurmeetsingh.wordpress.com/2008/08/05/fast-bit-counting-routines/
Here is another benchmark: https://gist.github.com/knugie/3865903
Simply run it on your machine if you're in doubt.
Ruby should not be used for maximum optimization, but checking for bottlenecks in your code is always reasonable. An Algorithm that works well in one domain does not necessarily work well in another one. Try to use real data from your application for optimization.
Sample OUTPUT:
$ ruby bit_count_benchmark.rb CPU : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz MEM : 3083288 kB RUBY : ruby-1.9.2-p320 "NORM": TEST... OK BENCHMARK (2000000): PREPARE... OK RUN... user system total real scan_string 227.770000 0.250000 228.020000 (227.912435) scan_regex 214.500000 0.220000 214.720000 (214.635405) progressive_right_shift 43.420000 0.030000 43.450000 ( 43.412643) continuous_right_shift 39.340000 0.010000 39.350000 ( 39.345163) count_string 19.910000 0.030000 19.940000 ( 19.932677) access_bit_fast 18.310000 0.040000 18.350000 ( 18.345740) bit_elimination_for 16.400000 0.010000 16.410000 ( 16.388461) bit_elimination_until 14.650000 0.000000 14.650000 ( 14.650187) bit_elimination_while 14.610000 0.000000 14.610000 ( 14.604845) pre_compute_16 4.370000 0.000000 4.370000 ( 4.371228) "NORM" FINISHED "LOTTO": TEST... OK BENCHMARK (2000000): PREPARE... OK RUN... user system total real scan_string 92.900000 0.100000 93.000000 ( 92.947647) scan_regex 79.500000 0.230000 79.730000 ( 79.671581) progressive_right_shift 43.430000 0.010000 43.440000 ( 43.424880) continuous_right_shift 35.360000 0.020000 35.380000 ( 35.360854) count_string 19.210000 0.020000 19.230000 ( 19.215173) access_bit_fast 17.890000 0.000000 17.890000 ( 17.890401) bit_elimination_for 5.680000 0.010000 5.690000 ( 5.680348) bit_elimination_until 5.040000 0.010000 5.050000 ( 5.054189) bit_elimination_while 5.080000 0.020000 5.100000 ( 5.093165) pre_compute_16 4.360000 0.010000 4.370000 ( 4.364988) "LOTTO" FINISHED DONE
Split the string in 8, look up each entry in a 128 entry lookup table and sum them up?
I know.. this is ridiculous... just sharing some ideas ;-)
精彩评论