Why is '+' not understood by Python sets?
I would like to know why this is valid:
set(range(10)) - set(range(5))
bu开发者_如何学编程t this is not valid:
set(range(10)) + set(range(5))
Is it because '+' could mean both intersection and union?
Python sets don't have an implementation for the +
operator.
You can use |
for set union and &
for set intersection.
Sets do implement -
as set difference. You can also use ^
for symmetric set difference (i.e., it will return a new set with only the objects that appear in one set but do not appear in both sets).
Python chose to use |
instead of +
because set union is a concept that is closely related to boolean disjunction; Bit vectors (which in python are just int
/long
) define this operation across a sequence of boolean values and call it "bitwise or". In fact this operation is so similar to the set union that binary integers are sometimes also called "Bit sets", where the elements in the set are taken to be the natural numbers.
Because int
already defines set-like operators as |
, &
and ^
, it was natural for the newer set
type to use the same interface.
In set theory the + symbol normally indicates the disjoint union of two sets. If A and B are sets, their disjoint union is defined to be the set
A + B = {(a, 1) | a in A} U {(b, 2) | b in B}
i.e., to construct the disjoint union, we mark all elements of A and all elements of B with different tags (in the example I used the numbers 1 and 2, but any two different "things" would do the job) and then take the union of the two resulting sets. In the above example, I have used 'U' for set union to make it more similar to the usual mathematical notation; below I use the Python notation, i.e. '|' for union, and '&' for intersection.
If A and B are disjoint, the A + B has a 1-to-1 correspondence with A | B. If they are not, then all common elements x in A & B appear twice in A + B: once as (x, 1), and once as (x, 2).
So, since the '+' symbol has a quite well-established meaning as a set operation, I find it very consistent that Python does not use this symbol for set union or intersection. Probably Python designer(s) had this in mind when they chose set operators.
Sure, they could have used +
to do a union, but then would still need a symbol for intersection. |
for union is symmetrical with &
for intersection and thus makes a better choice.
Because |
means union and &
means intersection. There's clearly no reason to add multiple operators for the same function.
The reasons for using |
and &
probably goes back to bitwise operations. If you represent a set as the bits in a number, those are the operators you'd use to do union and intersect.
+
simple isn't as tied to union and -
is to set difference.
Because set difference is a very useful and commonly known concept, but there's no (universally used) concept of „set addition“.
精彩评论