Can I count on order being preserved in a Python tuple?
I've got a list of datetimes from which I want to construct tim开发者_开发知识库e segments. In other words, turn [t0, t1, ... tn]
into [(t0,t1),(t1,t2),...,(tn-1, tn)]
. I've done it this way:
# start by sorting list of datetimes
mdtimes.sort()
# construct tuples which represent possible start and end dates
# left edges
dtg0 = [x for x in mdtimes]
dtg0.pop()
# right edges
dtg1 = [x for x in mdtimes]
dtg1.reverse()
dtg1.pop()
dtg1.sort()
dtsegs = zip(dtg0,dtg1)
Questions...
- Can I count on tn-1 < tn for any (tn-1,tn) after I've created them this way? (Is ordering preserved?)
- Is it good practice to copy the original
mdtimes
list with list comprehensions? If not how should it be done? The purpose for constructing these tuples is to iterate over them and segment a data set with
tn-1
andtn
. Is this a reasonable approach? i.e.datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))]
Thanks
Tuple order is as you insert values into the tuple. They're not going to be sorted as I think you're asking.
zip
will again, retain the order you inserted the values in.It's an acceptable method, but I have 2 alternate suggestions: Use the copy module, or use
dtg1 = mdtimes[:]
.Sounds reasonable.
Both list
and tuple
are ordered.
dtg0, dtg1 = itertools.tee(mdtimes)
next(dtg0)
dtsegs = zip(dtg0, dtg1)
You can achieve the same with zip
:
>>> l = ["t0", "t1", "t2", "t3", "t4", "t5", "t6"]
>>> zip(l[::2], l[1::2])
[('t0', 't1'), ('t2', 't3'), ('t4', 't5')]
Instead of: dtg0 = [x for x in mdtimes]
, dtg0 = mdtimes[:]
would do, since you just copy one list into another. Note: starting with Python 3.3, you can just say newlist = oldlist.copy()
As for order, zip
's order is well defined, and both lists and tuples are ordered collections, so you should have no problem here.
Turning (x1, x2, x3, ...) into [(x1, x2), (x2, x3), ...] is called a pairwise combination, and it's so common a pattern that the itertools documentation provides a recipe:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
for ta, tb in pairwise(mdtimes):
....
This is an answer to the question "Is this a reasonable approach?" (which appears to have been ignored by all).
Summary: You may want/need to lift your gaze from making a pairwise thingy out of mdtimes
to the encompassing problem (segmenting bigdata
).
Detail:
The desired use of the result is expressed as:
datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))]
which is better expressed as:
datasegment = [x for x in bigdata if tleft < x['datetime'] < tright]
Note that as that stands, it will not include any cases where the timestamp is exactly equal to one of the boundary points, so let's change it to:
datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
But that's going to appear in a loop:
for tleft, tright in dtsegs:
datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
do_something_with(datasegment)
Whoops! That's going to take time proportional to len(bigdata) * len(dtsegs)
... what are likely values of len(bigdata)
and len(dtsegs)
?
If bigdata
is sorted, what you want to do can be done in time proportional to N
, where N = len(bigdata)
. If bigdata
is not already sorted, it can be sorted in time proportional to N * log(N)
.
You might like to ask another question ...
It's also worth pointing out that any items in bigdata
that have a timestamp < min(mdtimes) or >= max(mdtimes) will not be included in any data segment ... is this intentional?
I'm no expert, but aren't you quadrupling your memory requirements by copying the list and then making a new list of pairs taken from two lists? Why not just do the following:
dtsegs = [(dtg0[i], dtg0[i+1]) for i in range(len(dtg0)-1)]
Dunno how "Pythonic" that is, though.
EDIT: Actually, looking at what you need to do with this list of tuples, you could just do this [i] and [i+1] stuff directly at that level and not even create this new structure at all. I don't know how many dates you're dealing with, though - if it's some small number I suppose it doesn't really matter.
For what it's worth, a couple of the other answerers here seem to be misunderstanding your question, though I can't comment on their posts since I don't have enough reputation yet :) Ignacio Vazquez-Abrams's solution seems the best to me, though his "next(dtg0)" should probably be "next(dtg1)" (?)
精彩评论