how to split a list in two at the point where predicate is first False
I keep thinking there should be a function for this, but I've searched the likely places (google, itertools docs, list methods, other SO questions), but nowhere found quite what I was looking for.
Naive and working implementation:
def split_at_first_false(pred, seq):
first = []
second = []
true_so_far = True
for item in seq:
if true_so_far and pred(item):
first.append(item)
else:
true_so_far = False
second.append(item)
return first, second
print split_at_first_false(str.isalpha, "abc1a2b")
# (['a', 'b', 'c'], ['1', 'a', '2', 'b'])
It works, but it doesn't feel right. There should be a better way to do this!
EDIT: I ended up开发者_如何学Go with using a slightly modified version of senderle's final suggestion after reviewing the answers:
from itertools import chain
def split_at_pred(pred, seq):
head = []
it = iter(seq)
for i in it:
if not pred(i):
head.append(i)
else:
return iter(head), chain([i], it)
return iter(head), iter([])
It's short and elegant, output is two iterators no matter the input (strings, lists, iterators), and as a bonus, it even works with the following input:
from itertools import count
split_at_pred(lambda x: x == 5, count())
The other solutions, those that work at all with iterators, will run out of memory with this input. (Note that this is just a bonus. Infinite iterators was something I hadn't even considered when I wrote this question)
This seems like a job for itertools.
>>> first = list(itertools.takewhile(str.isalpha, l))
>>> second = list(itertools.dropwhile(str.isalpha, l))
>>> first
['a', 'b', 'c']
>>> second
['1', 'a', '2', 'b']
This needs to be altered if l
is an iterator rather than a sequence.
>>> def bisect_iter(pred, i):
... i1, i2 = itertools.tee(i)
... return itertools.takewhile(pred, i1), itertools.dropwhile(pred, i2)
...
>>> i1, i2 = bisect_iter(str.isalpha, iter(l))
>>> list(i1)
['a', 'b', 'c']
>>> list(i2)
['1', 'a', '2', 'b']
The downside of tee
is that the initial values are cached and tested twice (by both takewhile
and dropwhile
). That's wasteful. But caching values is unavoidable if you want to both accept and return iterators.
However, if you can return lists from an iterator, I can think of one solution that doesn't make extra copies or tests, and it's very close to yours:
>>> def bisect_iter_to_list(pred, it):
... l1 = []
... for i in it:
... if pred(i):
... l1.append(i)
... else:
... l2 = [i]
... l2.extend(it)
... return l1, l2
...
>>> bisect_iter_to_list(str.isalpha, iter(l))
(['a', 'b', 'c'], ['1', 'a', '2', 'b'])
The only sneaky bit is that where there would normally be a break
statement (i.e. after the else
clause), I've simply consumed the iterator, causing the for
loop to terminate early.
Finally, if you still want to return iterators, but don't want to do extra tests, here's a variation on the above that I believe is optimal.
>>> def bisect_any_to_iter(pred, it):
... it = iter(it)
... head = []
... for i in it:
... if pred(i):
... head.append(i)
... else:
... tail = itertools.chain([i], it)
... break
... return iter(head), tail
...
>>> a, b = bisect_iter_to_iter(str.isalpha, iter(l))
>>> list(a)
['a', 'b', 'c']
>>> list(b)
['1', 'a', '2', 'b']
How about this?
def split_at_first_false(pred, seq):
for i, item in enumerate(seq):
if not pred(item):
return seq[:i], seq[i:]
What about this?
def split_at_first_false(pred, seq):
pos = 0
for item in seq:
if not pred(item):
return seq[:pos], seq[pos:]
pos += 1
Don't shy away from iterators, this is a perfect case for using one. Once the first false item is hit, use the same iterator to just fill out the rest of the items into the second list.
def split_at_false(pred, seq):
# if seq is not already an iterator, make it one
if not hasattr(seq,'next'):
seq = iter(seq)
first, second = [], []
for item in seq:
if not pred(item):
second.append(item)
break
first.append(item)
# at this point, seq points to the first item
# after the false item, just add it and all the
# rest to the second list
second.extend(seq)
return first, second
is_odd = lambda x : x % 2
print split_at_false(is_odd, [1])
print split_at_false(is_odd, [1,2,3,4,5])
print split_at_false(is_odd, [2,3,4,5,6])
print split_at_false(is_odd, [])
Prints:
([1], [])
([1], [2, 3, 4, 5])
([], [2, 3, 4, 5, 6])
([], [])
No tee'ing, no extra list storage, no iterating twice over the list, no slicing, just an iterator.
Try that:
def split_at_first_false(pred, seq):
index = 0
while index < len(seq):
if not pred(seq[index]):
return seq[:index], seq[index+1:]
index+=1
Try the following code :
data = "abc1a2b"
def split_at_first_false(pred, seq):
if not isinstance(seq, list):
seq = list(seq)
for i,x in enumerate(seq):
if not pred(x):
return seq[:i], seq[i:]
return seq, []
精彩评论