Filtering / iterating through very large lists in python
If I have a list with say 10 million objects, how do I filte开发者_开发问答r the list quickly. It takes about 4-5 seconds for a complete iteration thru a list comprehension. Are there any efficient data structures or libraries for this in python? Or is python not suited for large sets of data?
If you have uniform types of numbers & if speed is your primary goal (and you want to use python), use a Numpy array.
Itertools is designed for efficient looping. Particularly, you might find that ifilter
suits your purpose. Iterating through large data structures is always expensive, but if you only need some of the data at a time lazy evaluation can help a lot.
You can also try using generator expressions, which are usually identical to their list comprehension counterparts (though usage can be different) or a generator, which also have the benefits of lazy evaluation.
Even using the builtin functions on a very primitive integer array takes several seconds to evaluate on my computer.
>>> l=[1]*10000000
>>> s=filter(lambda x:True,l)
I'd suggest you using a different approach such as using Numpy or lazy evaluation with generators and/or using iteration module itertools
精彩评论