开发者

What's the most efficient way to deep copy an object in Ruby?

I know that serializing an object is (to my knowledge) the only way to effectively deep-copy an object (as long as it isn't stateful like IO and whatnot), but is one way particularly more efficient than another?

For example, since I'm using Rails, I could always use ActiveSupport::JSON, to_xml - and from what I can tell marshalling the object is one of the most accepted ways to do this. I'd expect that marshalling is probably the most efficient of these since it's a Ruby internal, but am I missing anything?

Edit: note that its implementation is something I already have covered - I don't want to replace existing shallow copy methods (like dup and clone), so I'll just end up likely adding 开发者_运维知识库Object::deep_copy, the result of which being whichever of the above methods (or any suggestions you have :) that has the least overhead.


I was wondering the same thing, so I benchmarked a few different techniques against each other. I was primarily concerned with Arrays and Hashes - I didn't test any complex objects. Perhaps unsurprisingly, a custom deep-clone implementation proved to be the fastest. If you are looking for quick and easy implementation, Marshal appears to be the way to go.

I also benchmarked an XML solution with Rails 3.0.7, not shown below. It was much, much slower, ~10 seconds for only 1000 iterations (the solutions below all ran 10,000 times for the benchmark).

Two notes regarding my JSON solution. First, I used the C variant, version 1.4.3. Second, it doesn't actually work 100%, as symbols will be converted to Strings.

This was all run with ruby 1.9.2p180.

#!/usr/bin/env ruby
require 'benchmark'
require 'yaml'
require 'json/ext'
require 'msgpack'

def dc1(value)
  Marshal.load(Marshal.dump(value))
end

def dc2(value)
  YAML.load(YAML.dump(value))
end

def dc3(value)
  JSON.load(JSON.dump(value))
end

def dc4(value)
  if value.is_a?(Hash)
    result = value.clone
    value.each{|k, v| result[k] = dc4(v)}
    result
  elsif value.is_a?(Array)
    result = value.clone
    result.clear
    value.each{|v| result << dc4(v)}
    result
  else
    value
  end
end

def dc5(value)
  MessagePack.unpack(value.to_msgpack)
end

value = {'a' => {:x => [1, [nil, 'b'], {'a' => 1}]}, 'b' => ['z']}

Benchmark.bm do |x|
  iterations = 10000
  x.report {iterations.times {dc1(value)}}
  x.report {iterations.times {dc2(value)}}
  x.report {iterations.times {dc3(value)}}
  x.report {iterations.times {dc4(value)}}
  x.report {iterations.times {dc5(value)}}
end

results in:

user       system     total       real
0.230000   0.000000   0.230000 (  0.239257)  (Marshal)
3.240000   0.030000   3.270000 (  3.262255)  (YAML) 
0.590000   0.010000   0.600000 (  0.601693)  (JSON)
0.060000   0.000000   0.060000 (  0.067661)  (Custom)
0.090000   0.010000   0.100000 (  0.097705)  (MessagePack)


I think you need to add an initialize_copy method to the class you are copying. Then put the logic for the deep copy in there. Then when you call clone it will fire that method. I haven't done it but that's my understanding.

I think plan B would be just overriding the clone method:

class CopyMe
    attr_accessor :var
    def initialize var=''
      @var = var
    end    
    def clone deep= false
      deep ? CopyMe.new(@var.clone) : CopyMe.new()
    end
end

a = CopyMe.new("test")  
puts "A: #{a.var}"
b = a.clone
puts "B: #{b.var}"
c = a.clone(true)
puts "C: #{c.var}"

Output

mike@sleepycat:~/projects$ ruby ~/Desktop/clone.rb 
A: test
B: 
C: test

I'm sure you could make that cooler with a little tinkering but for better or for worse that is probably how I would do it.


Probably the reason Ruby doesn't contain a deep clone has to do with the complexity of the problem. See the notes at the end.

To make a clone that will "deep copy," Hashes, Arrays, and elemental values, i.e., make a copy of each element in the original such that the copy will have the same values, but new objects, you can use this:

class Object
  def deepclone
    case
    when self.class==Hash
      hash = {}
      self.each { |k,v| hash[k] = v.deepclone }
      hash
    when self.class==Array
      array = []
      self.each { |v| array << v.deepclone }
      array
    else
      if defined?(self.class.new)
        self.class.new(self)
      else
        self
      end
    end
  end
end

If you want to redefine the behavior of Ruby's clone method , you can name it just clone instead of deepclone (in 3 places), but I have no idea how redefining Ruby's clone behavior will affect Ruby libraries, or Ruby on Rails, so Caveat Emptor. Personally, I can't recommend doing that.

For example:

a = {'a'=>'x','b'=>'y'}                          => {"a"=>"x", "b"=>"y"}
b = a.deepclone                                  => {"a"=>"x", "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 15227640 / 15209520

If you want your classes to deepclone properly, their new method (initialize) must be able to deepclone an object of that class in the standard way, i.e., if the first parameter is given, it's assumed to be an object to be deepcloned.

Suppose we want a class M, for example. The first parameter must be an optional object of class M. Here we have a second optional argument z to pre-set the value of z in the new object.

class M
  attr_accessor :z
  def initialize(m=nil, z=nil)
    if m
      # deepclone all the variables in m to the new object
      @z = m.z.deepclone
    else
      # default all the variables in M
      @z = z # default is nil if not specified
    end
  end
end

The z pre-set is ignored during cloning here, but your method may have a different behavior. Objects of this class would be created like this:

# a new 'plain vanilla' object of M
m=M.new                                        => #<M:0x0000000213fd88 @z=nil>
# a new object of M with m.z pre-set to 'g'
m=M.new(nil,'g')                               => #<M:0x00000002134ca8 @z="g">
# a deepclone of m in which the strings are the same value, but different objects
n=m.deepclone                                  => #<M:0x00000002131d00 @z="g">
puts "#{m.z.object_id} / #{n.z.object_id}" => 17409660 / 17403500

Where objects of class M are part of an array:

a = {'a'=>M.new(nil,'g'),'b'=>'y'}               => {"a"=>#<M:0x00000001f8bf78 @z="g">, "b"=>"y"}
b = a.deepclone                                  => {"a"=>#<M:0x00000001766f28 @z="g">, "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 12303600 / 12269460
puts "#{a['b'].object_id} / #{b['b'].object_id}" => 16811400 / 17802280

Notes:

  • If deepclone tries to clone an object which doesn't clone itself in the standard way, it may fail.
  • If deepclone tries to clone an object which can clone itself in the standard way, and if it is a complex structure, it may (and probably will) make a shallow clone of itself.
  • deepclone doesn't deep copy the keys in the Hashes. The reason is that they are not usually treated as data, but if you change hash[k] to hash[k.deepclone] they will also be deep copied also.
  • Certain elemental values have no new method, such as Fixnum. These objects always have the same object ID, and are copied, not cloned.
  • Be careful because when you deep copy, two parts of your Hash or Array that contained the same object in the original will contain different objects in the deepclone.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜