How do I specify token ordering in pyparsing?
Suppose I'm parsing the following line:
The quick brown fox jumps over the lazy dog
I'd like to parse this as:
Words('The quick brown fox') + Literal('jumps') + Words('over the lazy 开发者_JS百科dog')
My current pyparsing definition is:
some_words = OneOrMore(Word(alphas))
jumps = Literal('jumps')
sentence = some_words + jumps + some_words
What's happening is that the some_words
swallows up the 'jumps'
, and I get a parsing error. How do I make pyparsing lex the jumps as a literal token?
You are already thinking like the parser, since you understand that OneOrMore(Word(alphas))
keeps going, even to reading the word "jumps". Now turn that around and write the parser to do things the way you think.
For every word up to "jumps", how do you know that it should be added to the leading set of words? You know for each word because it is not the word "jumps". Pyparsing does not automatically do this lookahead, but you can do it for yourself with NotAny (which can be abbreviated using the '~' operator):
JUMPS = Literal("jumps")
some_words = OneOrMore(~JUMPS + Word(alphas))
Now before matching another word, some_words first verifies that the word is not "jumps".
精彩评论