Break a text file into chunks based on line like the string split operation?

2023-01-14 04:29 问答作者：

I h开发者_高级运维ave text report files I need to "split()" like strings are split up into arrays.

So the file is like:

BOBO:12341234123412341234
1234123412341234123412341
123412341234
BOBO:12349087609812340-98
43690871234509875
45

BOBO:32498714235908713248
0987235

And I want to create 3 sub-files out of that splitting on lines that begin with "^BOBO:". I don't really want 3 physical files, I'd prefer 3 different file pointers.

Perhaps use itertools.groupby:

import itertools

def bobo(x):    
    if x.startswith('BOBO:'):
        bobo.count+=1
    return bobo.count
bobo.count=0

with open('a') as f:
    for key,grp in itertools.groupby(f,bobo):
        print(key,list(grp))

yields:

(1, ['BOBO:12341234123412341234\n', '1234123412341234123412341\n', '123412341234\n'])
(2, ['BOBO:12349087609812340-98\n', '43690871234509875\n', '45\n', '\n'])
(3, ['BOBO:32498714235908713248\n', '0987235\n'])

Since you say you don't want physical files, the whole file must be able to fit in memory. In that case, to create file-like objects, use the cStringIO module:

import cStringIO
with open('a') as f:
    file_handles=[]
    for key,grp in itertools.groupby(f,bobo):
        file_handles.append(cStringIO.StringIO(''.join(grp)))

file_handles will be a list of file-like objects, one for each "BOBO:" stanza.

If you can deal with keeping them in memory to work with them something like this probably works:

subFileBlocks = []

with open('myReportFile.txt') as fh:
  for line in fh:
    if line.startswith('BOBO'):
      subFileBlocks.append(line)
    else:
      subFileBlocks[-1] += line

At the end of that subFileBlocks should contain your sections as strings.

继续阅读：python

Break a text file into chunks based on line like the string split operation?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？