Python parsing list of string

2023-02-25 21:58 问答作者：

I have list of strings, I'm looking for lines like this:

Key: af12d9 Index: 0 Field 1: 123开发者_开发知识库4 Field 2: 1234 Field 3: -10

after finding lines like this, I want to store each one as a dictionary {'key' : af12d9, 'index' : 0, 'field 1' : .... }, then store this dictionary to a list, so I will have a list of dictionaries.

I was able to get it working like this:

listconfig = []
for line in list_of_strings:
    matched = findall("(Key:[\s]*[0-9A-Fa-f]+[\s]*)|(Index:[\s]*[0-9]+[\s]*)|(Field 1:[\s]*[0-9]+[\s]*)|(Field 2:[\s]*[0-9]+[\s]*)|(Field 3:[\s]*[-+]?[0-9]+[\s]*)", line)
    if matched:
        listconfig += [dict(map(lambda pair: (pair[0].strip().lower(), pair[1].strip().lower()),
                                map(lambda line: line[0].split(':'),
                                    [filter(lambda x: x, group) for group in matched])))]

I'm just wondering if there could a better way (short and efficient) to do this because I think the findall will do 5 searches per string. (correct? since it returns a list of 5 tuples.)

Thank you.

Solution:

OK, with help of brandizzi, I have found THE answer to this question.

Solution:

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

Firstly, your regex seems to not work properly. The Key field should have values which could include f, right? So its group should not be ([0-9A-Ea-e]+) but instead ([0-9A-Fa-f]+). Also, it is a good - actually, a wonderful - practice to prefix the regex string with r when dealing with regexes because it avoids problems with \ escaping characters. (If you do not understand why to do it, look at raw strings)

Now, my approach to the problem. First, I would create a regex without pipes:

>>> regex = r"(Key):[\s]*([0-9A-Fa-f]+)[\s]*" \
...     r"(Index):[\s]*([0-9]+)[\s]*" \
...     r"(Field 1):[\s]*([0-9]+)[\s]*" \
...     r"(Field 2):[\s]*([0-9 A-Za-z]+)[\s]*" \
...     r"(Field 3):[\s]*([-+]?[0-9]+)[\s]*"

With this change, the findall() will return only one tuple of found groups for an entire line. In this tuple, each key is followed by its value:

>>> re.findall(regex, line)
[('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')]

So I get the tuple...

>>> found = re.findall(regex, line)[0]
>>> found
('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')

...and using slices I get only the keys...

>>> found[::2]
('Key', 'Index', 'Field 1', 'Field 2', 'Field 3')

...and also only the values:

>>> found[1::2]
('af12d9', '0', '1234', '1234 Ring ', '-10')

Then I create a list of tuples containing the key and its corresponding value with zip() function:

>>> zip(found[::2], found[1::2])
[('Key', 'af12d9'), ('Index', '0'), ('Field 1', '1234'), ('Field 2', '1234 Ring '), ('Field 3', '-10')]

The gran finale is to pass the list of tuples to the dict() constructor:

>>> dict(zip(found[::2], found[1::2]))
{'Field 3': '-10', 'Index': '0', 'Field 1': '1234', 'Key': 'af12d9', 'Field 2': '1234 Ring '}

I find this solution the best, but it is indeed a subjective question in some sense. HTH anyway :)

OK, with help of brandizzi, I have found THE answer to this question.

Solution:

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

import re

str_list = "Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Ring Field 3: -10"
results = {}
for match in re.findall("(.*?):\ (.*?)\ ", str_list+' '):
    results[match[0]] = match[1]

The pattern in your example is probably not matching your example data due to the "Ring". Here is some code which might help:

import re
# the keys to look for
keys = ['Key','Index','Field 1','Field 2','Field 3']
# a pattern for those keys in exact order
pattern = ''.join(["(%s):(.*)" % key for key in keys])
# sample data
data = "Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Ring Field 3: -10"
# look for the pattern
hit = re.match(pattern,data)
if hit:
    # get the matched elements
    groups = hit.groups()
    # group them in pairs and create a dict
    d = dict(zip(groups[::2], groups[1::2]))
    # print result
    print d

You could use a parser library. I know Lepl, so will use that, but because it is implemented in Python it will not be so efficient. However, the solution is fairly short and, I hope, very easy to understand:

def parser():
  key = (Drop("Key:") & Regexp("[0-9a-fA-F]+")) > 'key'
  index = (Drop("Index:") & Integer()) > 'index'
  def Field(n):
      return (Drop("Field" + str(n)) & Integer()) > 'field'+str(n)
  with DroppedSpaces():
      line = (key & index & Field(1) & Field(2) & Field(3)) >> make_dict
      return line[:]
p = parser()
print(p.parse_file(...))

It should also be relatively simple to handle a variable number of fields.

Note that the above is not tested (I need to get to work), but should be about right. In particular, it should return a list of dictionaries, as required.

Your solution would perform better if you did this[*]:

import re

from itertools import imap

regex = re.compile(flags=re.VERBOSE, pattern=r"""
    Key:\s*(?P<key>[0-9A-Fa-f]+)\s*
    Index:\s*(?P<index>[0-9]+)\s*
    Field\s+1:\s*(?P<field_1>[0-9]+)\s*
    Field\s+2:\s*(?P<field_2>[0-9A-Za-z]+)\s*
    Field\s+3:\s*(?P<field_3>[-+]?[0-9]+)\s*
""")

list_of_strings = [
    'Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Field 3: -10',
    'hey joe!',
    ''
]

listconfig = [
    match.groupdict() for match in imap(regex.search, list_of_strings) if match
]

Also, it'd be more succinct. Also, I fixed your broken regex pattern.

BTW, the result of the above would be:

[{'index': '0', 'field_2': '1234', 'field_3': '-10', 'key': 'af12d9', 'field_1': '1234'}]

[*] Actually - no, it wouldn't. I timeit'ed both and neither is faster than the other. Still, I like mine better.

继续阅读：parsing python regex search

Python parsing list of string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？