开发者

How do I specify token ordering in pyparsing?

Suppose I'm parsing the following line:

The quick brown fox jumps over the lazy dog

I'd like to parse this as:

Words('The quick brown fox') + Literal('jumps') + Words('over the lazy 开发者_JS百科dog')

My current pyparsing definition is:

some_words = OneOrMore(Word(alphas))
jumps      = Literal('jumps')
sentence   = some_words + jumps + some_words

What's happening is that the some_words swallows up the 'jumps', and I get a parsing error. How do I make pyparsing lex the jumps as a literal token?


You are already thinking like the parser, since you understand that OneOrMore(Word(alphas)) keeps going, even to reading the word "jumps". Now turn that around and write the parser to do things the way you think.

For every word up to "jumps", how do you know that it should be added to the leading set of words? You know for each word because it is not the word "jumps". Pyparsing does not automatically do this lookahead, but you can do it for yourself with NotAny (which can be abbreviated using the '~' operator):

JUMPS = Literal("jumps")
some_words = OneOrMore(~JUMPS + Word(alphas))

Now before matching another word, some_words first verifies that the word is not "jumps".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜