What is a more efficient way in Python to return list elements which are not in a second list?
Is there a faster way to do this in python?
[f for f in list_1 if not f in list_2]
list开发者_StackOverflow_1 and list_2 both consist of about 120.000 strings. It takes about 4 minutes to generate the new list.
If you put list_2
into a set
, it should make the containment checking a lot quicker:
s = set(list_2)
[f for f in list_1 if not f in s]
This is because x in list
is an O(n) check, while x in set
is constant-time.
Another way is to use set-difference:
list(set(list_1).difference(set(list_2)))
However, this probably won't be faster than the first way - also, it'll eliminate duplicates from list_1
which you may not want.
Depending on what you want to do with the new list, it might be sufficient if you do some kind of lazy evaluation with itertools.ifilter()
(so you don't spent time, building the new list beforehand, but you should transform list_2
to a set
before in any case, so lookup is O(1)
):
import itertools:
set_2 = set(list_2)
for f in itertools.ifilter(lambda x: x not in set_2, list_1):
# do something with f
精彩评论