Non greedy parsing with pyparsing
I'm trying to parse a line with pyparsing. This line is composed of a number of (key, values). What I'd like to get is a list of (key, values). A simple example:
ids = 12 fields = name
should result in something like: [('ids', '12'), ('fields', 'name')]
A more complex example:
ids = 12, 13, 14 fields = name, title
should result in something like: [('ids', '12, 13, 14'), ('fields', 'name, title')]
PS: the tuple inside the resulting list is just an example. It could be a dict or another list or whatever, it's not that important.
But whatever I've tried up to now I get results like:
[('ids', '12 fields')]
Pyparsing is eating the next key, considering it's also part of the value.
Here is a sample cod开发者_如何转开发e:
import pyparsing as P
key = P.oneOf("ids fields")
equal = P.Literal('=')
key_equal = key + equal
val = ~key_equal + P.Word(P.alphanums+', ')
gr = P.Group(key_equal+val)
print gr.parseString("ids = 12 fields = name")
Can someone help me ? Thanks.
The first problem lies in this line:
val = ~key_equal + P.Word(P.alphanums+', ')
It suggests that the part matches any alphanumeric sequence, followed by the literal ', '
, but instead it matches any sequence of alphanumeric characters, ','
and ' '
.
What you'd want instead is:
val = ~key_equal + P.delimitedList(P.Word(P.alphanums), ", ", combine=True)
The second problem is that you only parse one key-value pair:
gr = P.Group(key_equal+val)
Instead, you should parse as many as possible:
gr = P.Group(P.OneOrMore(key_equal+val))
So the correct solution is:
>>> import pyparsing as P
>>> key = P.oneOf("ids fields")
>>> equal = P.Literal('=')
>>> key_equal = key + equal
>>> val = ~key_equal + P.delimitedList(P.Word(P.alphanums), ", ", combine=True)
>>> gr = P.OneOrMore(P.Group(key_equal+val))
>>> print gr.parseString("ids = 12, 13, 14 fields = name, title")
[['ids', '=', '12, 13, 14'], ['fields', '=', 'name, title']]
精彩评论