Building a Regex Based Parser [closed]

2023-02-19 09:56 问答作者：

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. 开发者_C百科 Closed 10 years ago.

Is it stupid to build a regex based parser?

Matching nested parens is exceedingly simple using modern patterns. Not counting whitespace, this sort of thing:

\( (?: [^()] *+ | (?0) )* \)

works for mainstream languages like Perl and PHP, plus anything that uses PCRE.

However, you really need grammatical regexes for a full parse, or you’ll go nuts. Don’t use a language whose regexes don’t support breaking regexes down into smaller units, or which don’t support proper debugging of their compilation and execution. Life’s too short for low-level hackery. Might as well go back to assembly language if you’re going to do that.

I’ve written about recursive patterns, grammatical patterns, and parsing quite a bit: for example, see here for parsing approaches and here for lexer approaches; also, the final solution here.

Also, Perl’s Regexp::Grammars module is especially useful in turning grammatical regexes into parsing structures.

So by all means, go for it. You’ll learn a lot that way.

For work? Yes. For learning? No.

The allure of parsing your own little languages with regular expressions cannot be overstated: most sysadmins could write a simple language parser entirely in Perl very quickly, but parsing the same language with lex/yacc would take most programmers a few hours.

And the Perl version would probably just about do the job. But as gpvos points out, using regex backend for your parsing drastically reduces future enhancement options, and sometimes attempts to work around the limitations leads to some pretty awful code, when it would be easy to handle those general enhancements with table-driven tools or hand written recursive descent parsers.

If you know the language is always going to remain easily parse-able with regex, you might do the right thing by spending an hour to get the job done, rather than four or five re-learning lex and yacc enough to write a similar parser with stronger tools. But if the language is liable to grow or change much, using real parser generators will probably help in the long run.

It depends on what you want to parse, but IMO for most of the practical cases the answer is "No". Regex are quite limited on the grammar they can recognize (the limits being set by the regex implementation, as everybody put their own spice on it)

As you stated in your comments that you're building a parser for VBScript, forget about regexes as you need to recognize a Context Free Grammar. Check GOLD Parser or ANTLR.

Often, regexes are used for the lexer (the recognizing of tokens), and something more powerful such as a recursive descent parser is used for recognizing the sequences of tokens, i.e., the actual parsing.

For very simple languages, a regex could be enough, but you would be limiting yourself very much. For example, you cannot parse an expression like (1 + 2) * 3 - 4 using a regex.

Have a look at the GoldParser. It allows the use of regular expression for finding the tokens.

继续阅读：parsing regex

Building a Regex Based Parser [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？