开发者

"Deparsing" a list using pyparsing

Is it possible to give pyparsing a parsed list and 开发者_高级运维have it return the original string?


Yes, you can if you've instructed the parser not to throw away any input. You do it with the Combine combinator.

Let's say your input is:

>>> s = 'abc,def,  ghi'

Here's a parser that grabs the exact text of the list:

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

To "deparse":

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

which gives you the initial input back.

But this comes at a cost: having all that extra whitespace floating around as tokens is usually not convenient, and you'll note that we had to explicitly turn whitespace skipping off in myList. Here's a version that strips whitespace:

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

Note you're not getting the literal input back at this point, but this may be good enough for you. Also note we had to explicitly tell Combine to allow the skipping of whitespace.

Really, though, in many cases you don't even care about the delimiters; you want the parser to focus on the items themselves. There's a function called commaSeparatedList that conveniently strips both delimiters and whitespace for you:

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

In this case, though, the "deparsing" step doesn't have enough information for the reconstituted string to make sense:

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜