开发者

Fastest way to find the number of unique elements in a string

How can I find unique elements in a string in the best way?

Sample string format is

myString = "34345667543"

o/p

  ['3','4','3开发者_Python百科','5'.....]


This is an interesting question, and since it returns so many almost similar results, I did a simple benchmark to decide which is actually the best solution:

require 'rubygems'
require 'benchmark'
require 'set'

puts "Do the test"

Benchmark.bm(40) do |x|

  STRING_TEST = "26263636362626218118181111232112233"

  x.report("do split and uniq") do
    (1..1000000).each { STRING_TEST.split(//).uniq }
  end

  x.report("do chars to_a uniq") do
    (1..1000000).each { STRING_TEST.chars.to_a.uniq }
  end

  x.report("using Set") do
    (1..1000000).each { Set.new(STRING_TEST.split('')).to_a }
  end

end

and the results of this test are, not entirely surprising (0n 1.8.7p352):

                                              user     system      total        real
do split and uniq                        27.060000   0.000000  27.060000 ( 27.084629)
do chars to_a uniq                       14.440000   0.000000  14.440000 ( 14.452377)
using Set                                41.740000   0.000000  41.740000 ( 41.760313)

and on 1.9.2p180 :

                                              user     system      total        real
do split and uniq                        19.260000   0.000000  19.260000 ( 19.242727)
do chars to_a uniq                        8.980000   0.010000   8.990000 (  8.983891)
using Set                                28.220000   0.000000  28.220000 ( 28.186787)

The results for REE (1.8.7) are close to 1.9.2 :

                                              user     system      total        real
do split and uniq                        19.120000   0.000000  19.120000 ( 19.126034)
do chars to_a uniq                       14.740000   0.010000  14.750000 ( 14.766540)
using Set                                32.770000   0.120000  32.890000 ( 32.921878)

For fun, I also tried on rubinius:

                                              user     system      total        real
do split and uniq                        26.100000   0.000000  26.100000 ( 26.651468)
do chars to_a uniq                       25.680000   0.000000  25.680000 ( 25.780944)
using Set                                22.500000   0.000000  22.500000 ( 22.649291)

So while the split('\\').uniq wins points for readability, the chars.to_a.uniq is almost double as fast.

It is weird to notice that on rubinius the Set solution is the fastest, but no where near as fast as the chars.to_a.uniq on 1.9.2.


Use this short:

myString.split(//).uniq


>> "34345667543".chars.uniq
=> ["3", "4", "5", "6", "7"]


Just use the split method:

"12345".split("")


Set.new("34345667543".chars)

I find this reads well: create a Set (which implies unique entries) from the characters in the string.

This is missing from the benchmark above, and is the second fastest in my tests with 1.9.3-p274 (fastest is the chars.to_a.uniq). Although we're still talking microbenchmarks here, pretty unlikely to matter in an application :)


Take the characters from a string and make a Set out of them:

irb(main):001:0> require 'set'
irb(main):002:0> Set.new("123444454321".split(''))
=> #<Set: {"1", "2", "3", "4", "5"}>

The .split('') call just breaks the string into an array, character-wise. I originally used String#each_char, but that was new in 1.8.7, and you didn't mention what version of Ruby you're using.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜