Array Merge (Union)

2022-12-20 00:50 问答作者：

I have two array I need to merge, and using the Union (|) operator is PAINFULLY slow.. are there any other ways to accomplish an array merge?

Also, the arrays are filled with objects, not strings.

An Example of the objects within the开发者_C百科 array

#<Article 
 id: 1, 
 xml_document_id: 1, 
 source: "<article><domain>events.waikato.ac</domain><excerpt...", 
 created_at: "2010-02-11 01:32:46", 
 updated_at: "2010-02-11 01:41:28"
>

Where source is a short piece of XML.

EDIT

Sorry! By 'merge' I mean I need to not insert duplicates.

A => [1, 2, 3, 4, 5]
B => [3, 4, 5, 6, 7]
A.magic_merge(B) #=> [1, 2, 3, 4, 5, 6, 7]

Understanding that the integers are actually Article objects, and the Union operator appears to take forever

Here's a script which benchmarks two merge techniques: using the pipe operator (a1 | a2), and using concatenate-and-uniq ((a1 + a2).uniq). Two additional benchmarks give the time of concatenate and uniq individually.

require 'benchmark'

a1 = []; a2 = []
[a1, a2].each do |a|
  1000000.times { a << rand(999999) }
end

puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }

puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }

puts "Concat only:"
puts Benchmark.measure { a1 + a2 }

puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }

On my machine (Ubuntu Karmic, Ruby 1.8.7), I get output like this:

Merge with pipe:
  1.000000   0.030000   1.030000 (  1.020562)
Merge with concat and uniq:
  1.070000   0.000000   1.070000 (  1.071448)
Concat only:
  0.010000   0.000000   0.010000 (  0.005888)
Uniq only:
  0.980000   0.000000   0.980000 (  0.981700)

Which shows that these two techniques are very similar in speed, and that uniq is the larger component of the operation. This makes sense intuitively, being O(n) (at best), whereas simple concatenation is O(1).

So, if you really want to speed this up, you need to look at how the <=> operator is implemented for the objects in your arrays. I believe that most of the time is being spent comparing objects to ensure inequality between any pair in the final array.

Do you need the items to be in a specific order within the arrays? If not, you may want to check whether using Sets makes it faster.

Update

Adding to another answerer's code:

require "set"
require "benchmark"

a1 = []; a2 = []
[a1, a2].each do |a|
  1000000.times { a << rand(999999) }
end

s1, s2 = Set.new, Set.new

[s1, s2].each do |s|
  1000000.times { s << rand(999999) }
end

puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }

puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }

puts "Concat only:"
puts Benchmark.measure { a1 + a2 }

puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }

puts "Using sets"
puts Benchmark.measure {s1 + s2}

puts "Starting with arrays, but using sets"
puts Benchmark.measure {s3, s4 = [a1, a2].map{|a| Set.new(a)} ; (s3 + s4)}

gives (for ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0])

Merge with pipe:
  1.320000   0.040000   1.360000 (  1.349563)
Merge with concat and uniq:
  1.480000   0.030000   1.510000 (  1.512295)
Concat only:
  0.010000   0.000000   0.010000 (  0.019812)
Uniq only:
  1.460000   0.020000   1.480000 (  1.486857)
Using sets
  0.310000   0.010000   0.320000 (  0.321982)
Starting with arrays, but using sets
  2.340000   0.050000   2.390000 (  2.384066)

Suggests that sets may or may not be faster, depending on your circumstances (lots of merges or not many merges).

Using the Array#concat method will likely be a lot faster, according to my initial benchmarks using Ruby 1.8.7:

require 'benchmark'

def reset_arrays!
  @array1 = []
  @array2 = []

  [@array1, @array2].each do |array|
    10000.times { array << ActiveSupport::SecureRandom.hex }
  end
end

reset_arrays! && puts(Benchmark.measure { @array1 | @array2 })
# => 0.030000   0.000000   0.030000 (  0.026677)

reset_arrays! && puts(Benchmark.measure { @array1.concat(@array2) })
# => 0.000000   0.000000   0.000000 (  0.000122)

Try this and see if this is any faster

a = [1,2,3,3,2]
b = [1,2,3,4,3,2,5,7]
(a+b).uniq

继续阅读：arrays ruby ruby-on-rails

Array Merge (Union)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？