Ordering things in python...?
I was under the impression that set() would order a collection much like .sort()
However it seems that it doesn't, what was peculiar to me was why it reorders the collection.
>>> h = '321'
>>> set(h)
set(['1', '3', '2'])
>>> h
'321'
>>> h = '22311'
>>> set(h)
set(['1', '3', '2'])
why doesn't it return set(['1', '2', '3']). I also seems that no matter how many instances of each number I user or in what order I use them it always return set(['1', '3', '2']). Why?
Edit:
So I have read your answers and my counter to that is this.
>>> l = [1,2,3,3]
>>> set(l)
set([1, 2, 3])
>>> l = [3,3,2,3,1,1,3,2,3]
>>> set(l)
set([1, 2, 3])
Why does it order numbers and not strings?
Also
import random
l = []
for itr in xrange(101):
l.append(random.randint(1,101))
print set(l)
Outputs
>>>
set([1, 2, 4, 5, 6, 8, 10, 11, 12, 14, 15, 16, 18, 19, 23, 24, 25, 26, 29, 30, 31, 32, 34, 40, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64, 66开发者_如何学编程, 67, 69, 70, 74, 75, 77, 79, 80, 83, 84, 85, 87, 88, 89, 90, 93, 94, 96, 97, 99, 101])
python set
is unordered, hence there is no guarantee that the elements would be ordered in the same way as you specify them
If you want a sorted output, then call sorted:
sorted(set(h))
Responding to your edit: it comes down to the implementation of set. In CPython, it boils down to two things:
1) the set will be sorted by hash (the __hash__
function) modulo a limit
2) the limit is generally the next largest power of 2
So let's look at the int case:
x=1
type(x) # int
x.__hash__() # 1
for ints, the hash equals the original value:
[x==x.__hash__() for x in xrange(1000)].count(False) # = 0
Hence, when all the values are ints, it will use the integer hash value and everything works smoothly.
for the string representations, the hashes dont work the same way:
x='1'
type(x)
# str
x.__hash__()
# 6272018864
To understand why the sort breaks for ['1','2','3'], look at those hash values:
[str(x).__hash__() for x in xrange(1,4)]
# [6272018864, 6400019251, 6528019634]
In our example, the mod value is 4 (3 elts, 2^1 = 2, 2^2 = 4) so
[str(x).__hash__()%4 for x in xrange(1,4)]
# [0, 3, 2]
[(str(x).__hash__()%4,str(x)) for x in xrange(1,4)]
# [(0, '1'), (3, '2'), (2, '3')]
Now if you sort this beast, you get the ordering that you see in set:
[y[1] for y in sorted([(str(x).__hash__()%4,str(x)) for x in xrange(1,4)])]
# ['1', '3', '2']
From the python documentation of the set
type:
A set object is an unordered collection of distinct hashable objects.
This means that the set doesn't have a concept of the order of the elements in it. You should not be surprised when the elements are printed on your screen in an unusual order.
A set in Python tries to be a "set" in the mathematical sense of the term. No duplicates, and order shouldn't matter.
精彩评论