开发者

Limited deep copy of an instance with a container of containers as an attribute

I have a class

  • whose instances have attributes that are containers
    • which themselves contain containers, each containing many items
  • has an expensive initialization of these containers

I want to create copies of instances such that

  1. the container attributes are copied, rather than shared as references, but
  2. the containers within each container are not deeply copied, but are shared references
  3. a call to the class's expensive __init__() method is avoided if possible

For an example, let's use the class SetDict, below, which, when creating an instance, initializes a dictionary-like data structure as an attribute, d. d stores integers as keys and sets as values.

import collections

class SetDict(object):
    def __init__(self, size):
        self.d = collections.defaultdict(set)
        # Do some initialization; if size is large, this is expensive
        for i in range(size):
            self.d[i].add(1)

I would like to copy instances of SetDict, such that d is itself copied, but the sets that are its values are not deep-copied, and are instead only references to the sets.

For example, consider the following behavior currently for this class, where copy.copy doesn't copy the attribute d to the new copy, but copy.deepcopy creates completely new copies of the sets that are values of d.

>>> import copy
>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> # Try a basic copy
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # But oh no! We unintentionally also added the new key to s.d!
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> 
>>> s = SetDict(3)
>>> # Try a deep copy
>>> u = copy.deepcopy(s)
>>> u.d[0].add(2)
>>> u.d[0]
set([1, 2])
>>> # But oh no! 2 didn't get added to s.d[0]'s set
>>> s.d[0]
set([1])

The behavior I'd like开发者_JAVA百科 to see instead would be the following:

>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # s.d retains the same key-value pairs
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t.d[0].add(2)
>>> t.d[0]
set([1, 2])
>>> # s.d[0] also had 2 added to its set
>>> s.d[0]
set([1, 2])

This was my first attempt to create a class that would do this, but it fails due to infinite recursion:

class CopiableSetDict(SetDict):
    def __copy__(self):
        import copy
        # This version gives infinite recursion, but conveys what we
        # intend to do.
        #
        # First, create a shallow copy of this instance
        other = copy.copy(self)
        # Then create a separate shallow copy of the d
        # attribute
        other.d = copy.copy(self.d)
        return other

I'm not sure how to properly override the copy.copy (or copy.deepcopy) behavior to achieve this. I'm also not entirely sure if I should be overriding copy.copy or copy.deepcopy. How can I go about getting the desired copy behavior?


A class is a callable. When you call SetDict(3), SetDict.__call__ first calls the constructor SetDict.__new__(SetDict) and then calls the initializer __init__(3) on the return value of __new__ if it's an instance of SetDict. So you can get a new instance of SetDict (or any other class) without calling its initializer by just calling its constructor directly.

After that, you have an instance of your type and you can simply add regular copies of any container attributes and return it. Something like this should do the trick.

import collections
import copy

class SetDict(object):
    def __init__(self, size):
        self.d = collections.defaultdict(set)
        # Do some initialization; if size is large, this is expensive
        for i in range(size):
            self.d[i].add(1)

    def __copy__(self):
        other = SetDict.__new__(SetDict) 
        other.d = self.d.copy()
        return other

__new__ is a static method and requires the class to be constructed as its first argument. It should be as simple as this unless you're overriding __new__ to do something in which case you should show what it is so that this can be modified. Here's the test code do demonstrate the behavior that you want.

t = SetDict(3)
print t.d  # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})

s = copy.copy(t)
print s.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})

t.d[3].add(1)
print t.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([1])})
print s.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})

s.d[0].add(2)
print t.d[0] # set([1, 2])
print s.d[0] # set([1, 2])


Another option is to have the __init__ method take a default argument copying=False. If copying was True, It could just return. That would be something like

class Foo(object):
    def __init__(self, value, copying=False):
        if copying:
            return
        self.value = value

    def __copy__(self):
       other = Foo(0, copying=True)
       other.value = self.value
       return other

I don't like this as much because you have to pass dummy arguments to the __init__ method when you're making a copy and I like having an __init__ method whose sole purpose is to initialize an instance and not decide that an instance should or should not be initialized.


Based on aaronsterling's solution, I cooked up the following, which I think is more flexible, if there are other attributes associated with the instance:

class CopiableSetDict(SetDict):
    def __copy__(self):
        # Create an uninitialized instance
        other = self.__class__.__new__(self.__class__)
        # Give it the same attributes (references)
        other.__dict__ = self.__dict__.copy()
        # Create a copy of d dict so other can have its own
        other.d = self.d.copy()
        return other
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜