开发者

Python: regex on big file. Easy way?

I need to run a regex match over a file, but I'm faced with an unexpected problem: the file is too big to read() or mmap() in one call, File objects don't support the buf开发者_JAVA技巧fer() interface, and the regex module takes only strings or buffers.

Is there an easy way to do this?


The Python mmap module provides a nice Python-friendly way of memory mapping a file. On a 32-bit operating system, the maximum size of the file is will be limited to no more than a GB or maybe two, but on a 64-bit OS you will be able to memory map a file of arbitrary size (until storage sizes exceed 264, of course).

I've done this with files of up to 30 GB (the Wikipedia XML dump file) in Python with excellent results.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜