开发者

Parser for user defined infix operators

I am writing an interpreter for a language where functions can be used as operators. However, the functions content will only be known at runtime.

For that I considered two solutions:

  • Parsing is done at runtime, using the run开发者_如何学运维time information on the function
  • All user-defined operators use default values for precedence and associativity.

I chose the latter as I see a number of advantages in parsing separately to execution.

Now it comes to implementation and I am interested to see what options there are. My initial thoughts are a shift reduce parser, but I have little experience in constructing parsers.

Example:

LHS op RHS : LHS * RHS     /* define a binary operator 'op' */
var : 3                    /* define a variable */
print 5 op var             /* should print 15 */

LHS op RHS : LHS / RHS     /* Re-define op */
print var op var           /* Should print 1 */

in the last case, the parser will get from the lexer: " id id id id ". Only at runtime do I know that the 'op' id is an operator.


(Posting the results of the comments, as requested.)

Solution #1 is definitely ugly, complex to implement, and unneeded, I agree. Solution #2 is by far easier to implement and comprehend. You can also allow custom associativity and precedence for operators, as long as those are known statically. The main thing is that these facts are known at parse time.

As for actual parsing, most parsers will work just fine, as any two expression surrounding an id are an application of a custom infix operator (this is less true if you allow custom precedence and associativity, in that case you need an algorithm that allows determine those on a per-operator basis at parse time). Either case, my personal favorite is a "Top Down Operator Precedence Parser", or Pratt parser. I found the following resources (ordered by usefulness to me, YMMV) describe it quite well:

  • Simple Top-Down Parsing in Python
  • Pratt Parsers: Expression Parsing Made Easy
  • Top Down Operator Precedence

Two properties of the algorithm make it suit this problem very well:

  • The lookup of associativity ("binding power") happens dynamically for each token (allowing the parser to allow the user to define precedence for their operators).
  • It's very simple to write by hand[*], and you'll probably have to do that as such an degree of dynamism is beyond the scope of most (at least all I know) parser generators.

[*] I've personally written a parser for a very large (lacking only case, multidimensional arrays and perhaps some obscure subtleties) subset of Pascal in 500 lines of Python and 2-3 days of work, the rest is only missing because other parts of the software it's used in were more interesting at the time and I didn't have a reason to implement the rest.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜