Limited deep copy of an instance with a container of containers as an attribute

2023-01-27 11:32 问答作者：

I have a class

whose instances have attributes that are containers
- which themselves contain containers, each containing many items
has an expensive initialization of these containers

I want to create copies of instances such that

the container attributes are copied, rather than shared as references, but
the containers within each container are not deeply copied, but are shared references
a call to the class's expensive __init__() method is avoided if possible

For an example, let's use the class SetDict, below, which, when creating an instance, initializes a dictionary-like data structure as an attribute, d. d stores integers as keys and sets as values.

import collections

class SetDict(object):
    def __init__(self, size):
        self.d = collections.defaultdict(set)
        # Do some initialization; if size is large, this is expensive
        for i in range(size):
            self.d[i].add(1)

I would like to copy instances of SetDict, such that d is itself copied, but the sets that are its values are not deep-copied, and are instead only references to the sets.

For example, consider the following behavior currently for this class, where copy.copy doesn't copy the attribute d to the new copy, but copy.deepcopy creates completely new copies of the sets that are values of d.

>>> import copy
>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> # Try a basic copy
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # But oh no! We unintentionally also added the new key to s.d!
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> 
>>> s = SetDict(3)
>>> # Try a deep copy
>>> u = copy.deepcopy(s)
>>> u.d[0].add(2)
>>> u.d[0]
set([1, 2])
>>> # But oh no! 2 didn't get added to s.d[0]'s set
>>> s.d[0]
set([1])

The behavior I'd like开发者_JAVA百科 to see instead would be the following:

>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # s.d retains the same key-value pairs
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t.d[0].add(2)
>>> t.d[0]
set([1, 2])
>>> # s.d[0] also had 2 added to its set
>>> s.d[0]
set([1, 2])

This was my first attempt to create a class that would do this, but it fails due to infinite recursion:

class CopiableSetDict(SetDict):
    def __copy__(self):
        import copy
        # This version gives infinite recursion, but conveys what we
        # intend to do.
        #
        # First, create a shallow copy of this instance
        other = copy.copy(self)
        # Then create a separate shallow copy of the d
        # attribute
        other.d = copy.copy(self.d)
        return other

I'm not sure how to properly override the copy.copy (or copy.deepcopy) behavior to achieve this. I'm also not entirely sure if I should be overriding copy.copy or copy.deepcopy. How can I go about getting the desired copy behavior?

A class is a callable. When you call SetDict(3), SetDict.__call__ first calls the constructor SetDict.__new__(SetDict) and then calls the initializer __init__(3) on the return value of __new__ if it's an instance of SetDict. So you can get a new instance of SetDict (or any other class) without calling its initializer by just calling its constructor directly.

After that, you have an instance of your type and you can simply add regular copies of any container attributes and return it. Something like this should do the trick.

import collections
import copy

class SetDict(object):
    def __init__(self, size):
        self.d = collections.defaultdict(set)
        # Do some initialization; if size is large, this is expensive
        for i in range(size):
            self.d[i].add(1)

    def __copy__(self):
        other = SetDict.__new__(SetDict) 
        other.d = self.d.copy()
        return other

__new__ is a static method and requires the class to be constructed as its first argument. It should be as simple as this unless you're overriding __new__ to do something in which case you should show what it is so that this can be modified. Here's the test code do demonstrate the behavior that you want.

t = SetDict(3)
print t.d  # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})

s = copy.copy(t)
print s.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})

t.d[3].add(1)
print t.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([1])})
print s.d # defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})

s.d[0].add(2)
print t.d[0] # set([1, 2])
print s.d[0] # set([1, 2])

Another option is to have the __init__ method take a default argument copying=False. If copying was True, It could just return. That would be something like

class Foo(object):
    def __init__(self, value, copying=False):
        if copying:
            return
        self.value = value

    def __copy__(self):
       other = Foo(0, copying=True)
       other.value = self.value
       return other

I don't like this as much because you have to pass dummy arguments to the __init__ method when you're making a copy and I like having an __init__ method whose sole purpose is to initialize an instance and not decide that an instance should or should not be initialized.

Based on aaronsterling's solution, I cooked up the following, which I think is more flexible, if there are other attributes associated with the instance:

class CopiableSetDict(SetDict):
    def __copy__(self):
        # Create an uninitialized instance
        other = self.__class__.__new__(self.__class__)
        # Give it the same attributes (references)
        other.__dict__ = self.__dict__.copy()
        # Create a copy of d dict so other can have its own
        other.d = self.d.copy()
        return other

继续阅读：python

Limited deep copy of an instance with a container of containers as an attribute

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？