Python: regex on big file. Easy way?
I need to run a regex match over a file, but I'm faced with an unexpected problem: the file is too big to read()
or mmap()
in one call, File objects don't support the buf开发者_JAVA技巧fer()
interface, and the regex module takes only strings or buffers.
Is there an easy way to do this?
The Python mmap
module provides a nice Python-friendly way of memory mapping a file. On a 32-bit operating system, the maximum size of the file is will be limited to no more than a GB or maybe two, but on a 64-bit OS you will be able to memory map a file of arbitrary size (until storage sizes exceed 264, of course).
I've done this with files of up to 30 GB (the Wikipedia XML dump file) in Python with excellent results.
精彩评论