Parsing a TCL-like text
I have a configuration text that looks like this:
text="""
key1 value1
key2 { value1 value2 }
key3 subkey1 {
key1 1
key2 2
key3 {
value1
}
}
BLOBKEY name {
dont {
# comment
parse { me }
}
}
key3 subkey2 {
key1 value1
}
"""
The values are plain strings or quoted strings. The keys are just alphanum strings. I know before hand that key2
and key3.subkey1.key4
will hold sets, so I can treat those paths differently. Likewise, I know that BLOBKEY
will contain an "escaped" configuration section.
The goal is to convert it into a dictionary that looks like this:
{'key1': 'value1',
'key2': set(['value1', 'value2']),
'key3': {
'subkey1': {
'key1': 1,
'key2': 2,
'key3': set(['value1']),
},
'subkey2': {
'key1': 'value1'
}
},
'BLOBKEY': {
'name': " dont {\n # comment\n parse { me }\n }\n"
}
}
This code below does a pretty good job at breaking it down to a bunch of nested lists.
import pyparsing
string = pyparsing.CharsNotIn("{} \t\r\n")
group = pyparsing.Forward()
group << (
pyparsing.Group(pyparsing.Literal("{").suppress() +
pyparsing.ZeroOrMore(group) +
pypars开发者_如何学Going.Literal("}").suppress()) |
string
)
toplevel = pyparsing.OneOrMore(group)
What's the best way to get the result I want, in Python using pyparsing?
Here's my progress so far. It doesn't parse raw blobs, but everything else seems right.
LBRA = Literal("{").suppress()
RBRA = Literal("}").suppress()
EOL = lineEnd.suppress()
tmshString = Word(alphanums + '!#$%&()*+,-./:;<=>?@[\]^_`|~')
tmshValue = Combine( tmshString | dblQuotedString.setParseAction( removeQuotes ))
tmshKey = tmshString
def toSet(s, loc, t):
return set(t[0])
tmshSet = LBRA + Group(ZeroOrMore(tmshValue.setWhitespaceChars(' '))).setParseAction(toSet) + RBRA
def toDict(d, l):
if not l[0] in d:
d[l[0]] = {}
for v in l[1:]:
if type(v) == list:
toDict(d[l[0]],v)
else:
d[l[0]] = v
def trueDefault(s, loc, t):
return len(t) and t or True
singleKeyValue = Forward()
singleKeyValue << (
Group(
tmshKey + (
# A toggle value (i.e. key without value).
EOL.setParseAction(trueDefault) |
# A set of values on a single line.
tmshSet |
# A normal value or another singleKeyValue group.
Optional(tmshValue | LBRA + ZeroOrMore(singleKeyValue) + RBRA).setParseAction(trueDefault)
)
)
)
multiKeysOneValue = Forward()
multiKeysOneValue << (
Group(
tmshKey + (
multiKeysOneValue |
tmshSet |
LBRA + ZeroOrMore(singleKeyValue) + RBRA
)
)
)
toplevel = OneOrMore(multiKeysOneValue)
# now parse data and print results
data = toplevel.parseString(testData)
h = {}
map(lambda x:toDict(h, x), data.asList())
pprint(h)
精彩评论