开发者

Python basic data references, list of same reference

Say I have two lists:

>>> l1=[1,2,3,4]
>>> l2=[11,12,13,14]

I can put those lists in a tuple, or dictionary, and it appears that they are all references back to the origi开发者_开发技巧nal list:

>>> t=(l1,l2)
>>> d={'l1':l1, 'l2':l2}
>>> id(l1)==id(d['l1'])==id(t[0])
True
>>> l1 is d['l1'] is t[0]
True

Since they are references, I can change l1 and the referred data in the tuple and dictionary change accordingly:

>>> l1.append(5)
>>> l1
[1, 2, 3, 4, 5]
>>> t
([1, 2, 3, 4, 5], [11, 12, 13, 14])
>>> d
{'l2': [11, 12, 13, 14], 'l1': [1, 2, 3, 4, 5]}

Including if I append the reference in the dictionary d or mutable reference in the tuple t:

>>> d['l1'].append(6)
>>> t[0].append(7)
>>> d
{'l2': [11, 12, 13, 14], 'l1': [1, 2, 3, 4, 5, 6, 7]}
>>> l1
[1, 2, 3, 4, 5, 6, 7]

If I now set l1 to a new list, the reference count for the original list decreases:

>>> sys.getrefcount(l1)
4
>>> sys.getrefcount(t[0])
4
>>> l1=['new','list']
>>> l1 is d['l1'] is t[0]
False
>>> sys.getrefcount(l1)
2
>>> sys.getrefcount(t[0])
3

And appending or changing l1 does not change d['l1'] or t[0] since it now a new reference. The notion of indirect references is covered fairly well in the Python documents but not completely.

My questions:

  1. Is a mutable object always a reference? Can you always assume that modifying it modifies the original (Unless you specifically make a copy with l2=l1[:] kind of idiom)?

  2. Can I assemble a list of all the same references in Python? ie, Some function f(l1) that returns ['l1', 'd', 't'] if those all those are referring to the same list?

  3. It is my assumption that no matter what, the data will remain valid so long as there is some reference to it.

ie:

l=[1,2,3]         # create an object of three integers and create a ref to it
l2=l              # create a reference to the same object
l=[4,5,6]         # create a new object of 3 ints; the original now referenced 
                  # by l2 is unchanged and unmoved


1) Modifying a mutable object through a reference will always modify the "original". Honestly, this is betraying a misunderstanding of references. The newer reference is just as much the "original" as is any other reference. So long as both names point to the same object, modifying the object through either name will be reflected when accessed through the other name.

2) Not exactly like what you want. gc.get_referrers returns all references to the object.

>>> l = [1, 2]
>>> d = {0: l}
>>> t = (l, )
>>> import gc
>>> import pprint
>>> pprint.pprint(gc.get_referrers(l))
[{'__builtins__': <module '__builtin__' (built-in)>,
  '__doc__': None,
  '__name__': '__main__',
  '__package__': None,
  'd': {0: [1, 2]},
  'gc': <module 'gc' (built-in)>,
  'l': [1, 2],
  'pprint': <module 'pprint' from '/usr/lib/python2.6/pprint.pyc'>,
  't': ([1, 2],)},   # This is globals()

 {0: [1, 2]},  # This is d
 ([1, 2],)]   # this is t

Note that the actual object referenced by l is not included in the returned list because it does not contain a reference to itself. globals() is returned because that does contain a reference to the original list.

3) If by valid, you mean "will not be garbage collected" then this is correct barring a highly unlikely bug. It would be a pretty sorry garbage collector that "stole" your data.


Every variable in Python is a reference.

For lists, you are focusing on the results of the append() method, and loosing sight of the bigger picture of Python data structures. There are other methods on lists, and there are advantages and consequences to how a list is constructed. It is helpful to think of list as view on to other objects referred to in the list. They do not "containing" anything other than the rules and ways of accessing the data referred to by objects within them.

The list.append(x) method specifically is equivalent to l[len(l):]=[list]

So:

>>> l1=range(3)
>>> l2=range(20,23)
>>> l3=range(30,33)
>>> l1[len(l1):]=[l2]    # equivalent to 'append' for subscriptable sequences 
>>> l1[len(l1):]=l3      # same as 'extend'
>>> l1
[0, 1, 2, [20, 21, 22], 30, 31, 32]
>>> len(l1)
7
>>> l1.index(30)
4
>>> l1.index(20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list
>>> 20 in l1
False
>>> 30 in l1
True

By putting the list constructor around l2 in l1[len(l1):]=[l2], or calling l.append(l2), you create a reference that is bound to l2. If you change l2, the references will show the change as well. The length of that in the list is a single element -- the reference to the appended sequence.

With no constructor shortcut as in l1[len(l1):]=l3, you copy each element of the sequence.

If you use other common list methods, such as l.index(something), or in you will not find elements inside of the data references. l.sort() will not sort properly. They are "shallow" operations on the object, and by using l1[len(l1):]=[l2] you are now creating a recursive data structure.

If you use l1[len(l1):]=l3, you are making a true (shallow) copy of the elements in l3.

These are fairly fundamental Python idioms, and most of the time they 'do the right thing.' You can, however, get surprising results, such as:

>>> m=[[None]*2]*3
>>> m
[[None, None], [None, None], [None, None]]
>>> m[0][1]=33
>>> m
[[None, 33], [None, 33], [None, 33]]   # probably not what was intended...
>>> m[0] is m[1] is m[2]               # same object, that's why they all changed
True

Some Python newbies try to create a multi dimension by doing something like m=[[None]*2]*3 The first sequence replication works as expected; it creates 2 copies of None. It is the second that is the issue: it creates three copies of the reference to the first list. So entering m[0][1]=33 modifies the list inside the list bound to m and then all the bound references change to show that change.

Compare to:

>>> m=[[None]*2,[None]*2,[None]*2]
>>> m
[[None, None], [None, None], [None, None]]
>>> m[0][1]=33
>>> m
[[None, 33], [None, None], [None, None]]

You can also use nested list comprehensions to do the same like so:

>>> m=[[ None for i  in range(2)] for j in range(3)]
>>> m
[[None, None], [None, None], [None, None]]
>>> m[0][1]=44
>>> m
[[None, 44], [None, None], [None, None]]
>>> m[0] is m[1] is m[2]                      # three different lists....
False

For lists and references, Fredrik Lundh has this text for a good intro.

As to your specific questions:

1) In Python, Everything is a label or a reference to an object. There is no 'original' (a C++ concept) and there is no distinction between 'reference', pointer, or actual data (a C / Perl concept)

2) Fredrik Lundh has a great analogy about in reference to a question similar to this:

The same way as you get the name of that cat you found on your porch: the cat (object) itself cannot tell you its name, and it doesn't really care -- so the only way to find out what it's called is to ask all your neighbours (namespaces) if it's their cat (object)...

....and don't be surprised if you'll find that it's known by many names, or no name at all!

You can find this list with some effort, but why? Just call it what you are going to call it -- like a found cat.

3) True.


1- Is a mutable object always a reference? Can you always assume that modifying it modifies the original (Unless you specifically make a copy with l2=l1[:] kind of idiom)?

Yes. Actually non-mutable objects are always a reference as well. You just can't change them to perceive this.

2 - Can I assemble a list of all the same references in Python? ie, Some function f(l1) that returns ['l1', 'd', 't'] if those all those are referring to the same list?

That is odd, but it can be done.

You can compare objects for "samenes" with the is operator. Like in l1 is t[0]

And you can get all referred-to objects with the function gc.get_referrers in the garbage collector module (gc) -- You can check which of these referrers point o your object with the isoperator. So,yes, it can be done. I just don't think it would be a good idea. It is more likely the is operator offer a way for you to do waht you need alone

3- It is my assumption that no matter what, the data will remain valid so long as there is some reference to it.

Yes.


Is a mutable object always a reference? Can you always assume that modifying it modifies the original (Unless you specifically make a copy with l2=l1[:] kind of idiom)?

Python has reference semantics: variables do not store values as in C++, but instead label them. The concept of "the original" is flawed: if two variables label the same value, it is totally irrelevant which one "came first". It doesn't matter if the object is mutable or not (except that immutable objects won't make it so easy to tell what's going on behind the scenes). To make copies in a more general-purpose way, try the copy module.

Can I assemble a list of all the same references in Python? ie, Some function f(l1) that returns ['l1', 'd', 't'] if those all those are referring to the same list?

Not easily. Refer to aaronasterling's answer for details. You could also try something like k for k, v in locals().items() if v is the_object, but you'll also have to search globals(), you'll miss some stuff and it might cause some kind of problems due to recursing with the names 'k' and 'v' (I haven't tested).

It is my assumption that no matter what, the data will remain valid so long as there is some reference to it.

Absolutely.


  1. "... object is a reference ..." is nonsense. References aren't objects. Variables, member fields, slots in lists and sets, etc. hold references, and these references point to objects. There can be any number (in a non-refcouting implementations, even none - temporarily, i.e. until the GC kicks in) references to an object. Everyone who has a reference to an object can invoke it's methods, access it's members, etc. - this is true for all objects. Of course only mutable objects can be changed this way, so you usually don't care for immutable ones.
  2. Yes, as others have shown. But this shouldn't be necessary unless you're either debugging the GC or tracking down a serious memory leak in your code - why do you think you need this?
  3. Python has automatic memory management, so yes. As long as there is a reference to an object, it won't be deleted (however, it may stay alive for a while after it became unreachable, due to cyclic references and the fact that GCs only run once in a while).


1a. Is a mutable object always a reference?

There is no difference between mutable and non-mutable objects. Seeing the variable names as references is helpful for people with a C-background (but implies they can be dereferenced, which they can not).

1b. Can you always assume that modifying it modifies the original

Please, it's not "the original". It's the same object. b = a means b and a now are the same object.

1c. (Unless you specifically make a copy with l2=l1[:] kind of idiom)?

Right, because then it is not the same object anymore. (Although the entries n the list will be the same objects as the original list).

2. Can I assemble a list of all the same references in Python?

Yes, probably, but you will never ever ever need it, so that would be a waste of energy. :)

3. It is my assumption that no matter what, the data will remain valid so long as there is some reference to it.

Yes, an object will not be garbage collected as long as you have a reference to it. (Using the word "valid" here seems incorrect, but I assume this is what you mean).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜