开发者

Using mmap with popen

I need to read in and process a bunch of ~40mb gzipped text files, and I need it done f开发者_高级运维ast and with minimal i/o overhead (as the volumes are used by others as well). The fastest way I've found thus for this task looks like this:

def gziplines(fname): 
    f = Popen(['zcat', fname], stdout=PIPE)
    for line in f.stdout:
        yield line

and then:

for line in gziplines(filename)
    dostuff(line)

but what I would like to do (IF this is faster?) is something like this:

def gzipmmap(fname): 
    f = Popen(['zcat', fname], stdout=PIPE)
    m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
    return m

sadly, when I try this, I get this error:

>>> m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
mmap.error: [Errno 19] No such device

even though, when I try:

>>> f.stdout.fileno()
4

So, I think I have a basic misunderstanding of what is going on here. :(

The two questions are:

1) Would this mmap be a faster method at putting the whole file into memory for processing?

2) How can I achieve this?

Thank you very much... everyone here has been incredibly helpful already! ~Nik


From the mmap(2) man page:

   ENODEV The  underlying  file system of the specified file does not sup-
          port memory mapping.

You cannot mmap streams, only real files or anonymous swap space. You will need to read from the stream into memory yourself.


Pipes aren't mmapable.

case MAP_PRIVATE:
      ...
if (!file->f_op || !file->f_op->mmap)
        return -ENODEV;

and pipe's file operations does not contain mmap hook.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜