Token() objects in Lepl

2023-03-21 06:20 问答作者：

So I'm making my way through the Lepl tutorial, a Python parser, and I can't quite figure out what exactly the difference is betw开发者_运维技巧een something like Token(Real()) and just Real(). I found the docs on the function but they are pretty unhelpful.

So, what exactly does the Token() class do? Why is it different than regular Lepl classes?

Normally, LEPL operates on a stream of characters as they are in the input. That's simple, but as you have seen you'd need lots of redundant rules to ignore e.g. whitespace whereever it is legal but ignored.

This problem has a common solution, namely first running the input string though a relatively simple automaton that takes care of this and other distractions. It breaks the input into pieces (e.g. numbers, identifierts, operators, etc.) and strips ignored parts (e.g. comments and whitespace). This makes the rest of the parser simpler, but LEPL's default model has no place for this automaton, which is btw called tokenizer or lexical analyzer (lexer for short).

Each kind of token is usually defined as a regular expression that describes what goes into each token, e.g. [+-][0-9]+ for integers. You can (and sometimes should) do just that with Token(), e.g. Token('a+b+') gives a parser that consumes as much of the input as the regex matches, then returns it as a single string. For the most part, these parsers work just like all others, most importantly, they can be combined in the same ways. For example, Token('a+') & Token('b+') works and is equivalent to the previous except that is produces two strings, and Token('a+') + Token('b+') is exactly equivalent. Thus far, they're just a shorter notation for some basic building blocks of some grammars. You can also use some of LEPL's classes with Token() to convert it into an equivalent regular expression and use that as token - e.g. Token(Literal('ab+')) is Token(r'ab\+').

The one important difference and huge advantage is that, using tokens, you can also give patterns that drop in and discard some input if there's no other token that would match - the default discards whitespace, which makes ignoring whitespace very easy (while still allowing the parser to require whitespace in some places). The downside is that you have to wrap all non-token matchers in tokens or write equivalent rules by hand if they can't be converted automatically.

继续阅读：lepl parsing python

Token() objects in Lepl

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？