开发者

How do I split a huge file into multiple files?

What's the easiest way to do this without running out of memory?

I have a 9GB file that has 100 million lines (each is a URL).

How can I split this up into X files? I tried for f in fileinput.input('...'), but it 开发者_JAVA技巧got "killed" for some reason.


from __future__ import with_statement

YOUR_FILENAME= 'bigfile.log'
SPLIT_NAME= 'bigfile.part%05d.log'
SPLIT_SIZE= 10000 # lines
SPLITTER= lambda t: t[0]//SPLIT_SIZE

import itertools as it

with open(YOUR_FILENAME, "r") as input_file:
    for part_no, lines in it.groupby(enumerate(input_file), SPLITTER):
        with open(SPLIT_NAME % part_no, "w") as out:
             out.writelines(item[1] for item in lines)

Store the correct filename as YOUR_FILENAME. Decide on how many lines each part will have (SPLIT_SIZE). Decide on the output name (SPLIT_NAME). Run it. You are not restricted to plain filenames in YOUR_FILENAME and SPLIT_NAME, obviously; you can use paths.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜