Fastest way to find the number of unique elements in a string
How can I find unique elements in a string in the best way?
Sample string format is
myString = "34345667543"
o/p
['3','4','3开发者_Python百科','5'.....]
This is an interesting question, and since it returns so many almost similar results, I did a simple benchmark to decide which is actually the best solution:
require 'rubygems'
require 'benchmark'
require 'set'
puts "Do the test"
Benchmark.bm(40) do |x|
STRING_TEST = "26263636362626218118181111232112233"
x.report("do split and uniq") do
(1..1000000).each { STRING_TEST.split(//).uniq }
end
x.report("do chars to_a uniq") do
(1..1000000).each { STRING_TEST.chars.to_a.uniq }
end
x.report("using Set") do
(1..1000000).each { Set.new(STRING_TEST.split('')).to_a }
end
end
and the results of this test are, not entirely surprising (0n 1.8.7p352):
user system total real
do split and uniq 27.060000 0.000000 27.060000 ( 27.084629)
do chars to_a uniq 14.440000 0.000000 14.440000 ( 14.452377)
using Set 41.740000 0.000000 41.740000 ( 41.760313)
and on 1.9.2p180 :
user system total real
do split and uniq 19.260000 0.000000 19.260000 ( 19.242727)
do chars to_a uniq 8.980000 0.010000 8.990000 ( 8.983891)
using Set 28.220000 0.000000 28.220000 ( 28.186787)
The results for REE (1.8.7) are close to 1.9.2 :
user system total real
do split and uniq 19.120000 0.000000 19.120000 ( 19.126034)
do chars to_a uniq 14.740000 0.010000 14.750000 ( 14.766540)
using Set 32.770000 0.120000 32.890000 ( 32.921878)
For fun, I also tried on rubinius:
user system total real
do split and uniq 26.100000 0.000000 26.100000 ( 26.651468)
do chars to_a uniq 25.680000 0.000000 25.680000 ( 25.780944)
using Set 22.500000 0.000000 22.500000 ( 22.649291)
So while the split('\\').uniq
wins points for readability, the chars.to_a.uniq
is almost double as fast.
It is weird to notice that on rubinius the Set
solution is the fastest, but no where near as fast as the chars.to_a.uniq
on 1.9.2.
Use this short:
myString.split(//).uniq
>> "34345667543".chars.uniq
=> ["3", "4", "5", "6", "7"]
Just use the split method:
"12345".split("")
Set.new("34345667543".chars)
I find this reads well: create a Set (which implies unique entries) from the characters in the string.
This is missing from the benchmark above, and is the second fastest in my tests with 1.9.3-p274 (fastest is the chars.to_a.uniq). Although we're still talking microbenchmarks here, pretty unlikely to matter in an application :)
Take the characters from a string and make a Set out of them:
irb(main):001:0> require 'set'
irb(main):002:0> Set.new("123444454321".split(''))
=> #<Set: {"1", "2", "3", "4", "5"}>
The .split('')
call just breaks the string into an array, character-wise. I originally used String#each_char
, but that was new in 1.8.7, and you didn't mention what version of Ruby you're using.
精彩评论