Python: Parsing a colon delimited file with various counts of fields

2023-02-07 21:52 问答作者：

I'm trying to parse a a few files with the following format in 'clientname'.txt

hostname:comp1
time: Fri Jan 28 20:00:02 GMT 2011
ip:xxx.xxx.xx.xx
fs:good:45
memory:ba开发者_如何学JAVAd:78
swap:good:34
Mail:good

Each section is delimited by a : but where lines 0,2,6 have 2 fields... lines 1,3-5 have 3 or more fields. (A big issue I've had trouble with is the time: line, since 20:00:02 is really a time and not 3 separate fields.

I have several files like this that I need to parse. There are many more lines in some of these files with multiple fields.

...
for i in clients:
if os.path.isfile(rpt_path + i + rpt_ext):          # if the rpt exists then do this
    rpt = rpt_path + i + rpt_ext
    l_count = 0
    for line in open(rpt, "r"):
        s_line = line.rstrip()
        part = s_line.split(':')
        print part
        l_count = l_count + 1
else:                                               # else break
    break

First I'm checking if the file exists first, if it does then open the file and parse it (eventually) As of now I'm just printing the output (print part) to make sure it's parsing right. Honestly, the only trouble I'm having at this point is the time: field. How can I treat that line specifically different than all the others? The time field is ALWAYS the 2nd line in all of my report files.

split method has the following syntax split( [sep [,maxsplit]]) and if the maxsplit is given, it will make maxsplit+1 parts. In you case, you just have give maxsplit as 1. Just split(':',1) would solve your problem.

If time is a special case, you could do:

[...]
s_line = line.rstrip()
if line.startswith('time:'):
    part = s_line.split(':', 1)
else:
    part = s_line.split(':')
print part
[...]

This would give you:

['hostname', 'comp1']
['time', ' Fri Jan 28 20:00:02 GMT 2011']
['ip', 'xxx.xxx.xx.xx']
['fs', 'good', '45']
['memory', 'bad', '78']
['swap', 'good', '34']
['Mail', 'good']

And doesn't rely on the position of time in the file.

Design considerations:

Robustly handle extraneous whitespace, including blank lines, and missing colons.

Extract a record_type, which is then used to decide how to parse the remainder of the line.

>>> def munched(s, n=None):
...     if n is None:
...         n = 99999999 # this kludge should not be necessary
...     return [x.strip() for x in s.split(':', n)]
...
>>> def parse_line(line):
...     if ':' not in line:
...         return [line.strip(), '']
...     record_type, remainder = munched(line, 1)
...     if record_type == 'time':
...         data = [remainder]
...     else:
...         data = munched(remainder)
...     return record_type, data
...
>>> for guff in """
... hostname:comp1
... time: Fri Jan 28 20:00:02 GMT 2011
... ip:xxx.xxx.xx.xx
... fs:good:45
...     memory   :    bad   :   78
... missing colon
... Mail:good""".splitlines(True):
...    print repr(guff), parse_line(guff)
...
'\n' ['', '']
'hostname:comp1\n' ('hostname', ['comp1'])
'time: Fri Jan 28 20:00:02 GMT 2011\n' ('time', ['Fri Jan 28 20:00:02 GMT 2011'])
'ip:xxx.xxx.xx.xx\n' ('ip', ['xxx.xxx.xx.xx'])
'fs:good:45\n' ('fs', ['good', '45'])
'    memory   :    bad   :   78    \n' ('memory', ['bad', '78'])
'missing colon\n' ['missing colon', '']
'Mail:good' ('Mail', ['good'])
>>>

If the time field always the 2nd line. Why can't you skip it and parse it separately?

Something like

for i, line in enumerate(open(rpt, "r").read().splitlines()):
    if i==1: # Special parsing for time: line
        data = line[5:]
    else:
        # your normal parsing logic

继续阅读：delimiter file parsing python

Python: Parsing a colon delimited file with various counts of fields

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？