Limited deep copy of an instance with a container of containers as an attribute
I have a class
- whose instances have attributes that are containers
- which themselves contain containers, each containing many items
- has an expensive initialization of these containers
I want to create copies of instances such that
- the container attributes are copied, rather than shared as references, but
- the containers within each container are not deeply copied, but are shared references
- a call to the class's expensive
__init__()
method is avoided if possible
For an example, let's use the class SetDict
, below, which, when creating an instance, initializes a dictionary-like data structure as an attribute, d
. d
stores integers as keys and sets as values.
import collections
class SetDict(object):
def __init__(self, size):
self.d = collections.defaultdict(set)
# Do some initialization; if size is large, this is expensive
for i in range(size):
self.d[i].add(1)
I would like to copy instances of SetDict
, such that d
is itself copied, but the sets that are its values are not deep-copied, and are instead only references to the sets.
For example, consider the following behavior currently for this class, where copy.copy
doesn't copy the attribute d
to the new copy, but copy.deepcopy
creates completely new copies of the sets that are values of d
.
>>> import copy
>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> # Try a basic copy
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # But oh no! We unintentionally also added the new key to s.d!
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>>
>>> s = SetDict(3)
>>> # Try a deep copy
>>> u = copy.deepcopy(s)
>>> u.d[0].add(2)
>>> u.d[0]
set([1, 2])
>>> # But oh no! 2 didn't get added to s.d[0]'s set
>>> s.d[0]
set([1])
The behavior I'd like开发者_JAVA百科 to see instead would be the following:
>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # s.d retains the same key-value pairs
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t.d[0].add(2)
>>> t.d[0]
set([1, 2])
>>> # s.d[0] also had 2 added to its set
>>> s.d[0]
set([1, 2])
This was my first attempt to create a class that would do this, but it fails due to infinite recursion:
class CopiableSetDict(SetDict):
def __copy__(self):
import copy
# This version gives infinite recursion, but conveys what we
# intend to do.
#
# First, create a shallow copy of this instance
other = copy.copy(self)
# Then create a separate shallow copy of the d
# attribute
other.d = copy.copy(self.d)
return other
I'm not sure how to properly override the copy.copy
(or copy.deepcopy
) behavior to achieve this. I'm also not entirely sure if I should be overriding copy.copy
or copy.deepcopy
. How can I go about getting the desired copy behavior?
A class is a callable. When you call SetDict(3)
, SetDict.__call__
first calls the constructor SetDict.__new__(SetDict)
and then calls the initializer __init__(3)
on the return value of __new__
if it's an instance of SetDict
. So you can get a new instance of SetDict
(or any other class) without calling its initializer by just calling its constructor directly.
After that, you have an instance of your type and you can simply add regular copies of any container attributes and return it. Something like this should do the trick.
import collections
import copy
class SetDict(object):
def __init__(self, size):
self.d = collections.defaultdict(set)
# Do some initialization; if size is large, this is expensive
for i in range(size):
self.d[i].add(1)
def __copy__(self):
other = SetDict.__new__(SetDict)
other.d = self.d.copy()
return other
__new__
is a static method and requires the class to be constructed as its first argument. It should be as simple as this unless you're overriding __new__
to do something in which case you should show what it is so that this can be modified. Here's the test code do demonstrate the behavior that you want.
t = SetDict(3)
print t.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
s = copy.copy(t)
print s.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
t.d[3].add(1)
print t.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([1])})
print s.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
s.d[0].add(2)
print t.d[0] # set([1, 2])
print s.d[0] # set([1, 2])
Another option is to have the __init__
method take a default argument copying=False
. If copying was True
, It could just return. That would be something like
class Foo(object):
def __init__(self, value, copying=False):
if copying:
return
self.value = value
def __copy__(self):
other = Foo(0, copying=True)
other.value = self.value
return other
I don't like this as much because you have to pass dummy arguments to the __init__
method when you're making a copy and I like having an __init__
method whose sole purpose is to initialize an instance and not decide that an instance should or should not be initialized.
Based on aaronsterling's solution, I cooked up the following, which I think is more flexible, if there are other attributes associated with the instance:
class CopiableSetDict(SetDict):
def __copy__(self):
# Create an uninitialized instance
other = self.__class__.__new__(self.__class__)
# Give it the same attributes (references)
other.__dict__ = self.__dict__.copy()
# Create a copy of d dict so other can have its own
other.d = self.d.copy()
return other
精彩评论