开发者

Language Parser with conditions .NET

I am looking for a language parser that can resolve somthing like the following:

(x=7 OR y=1) AN开发者_StackOverflowD (x>0)

I was thinking of using ANTLR Parser Generator. Is there a more simple Language Parser in more advanced .NET framework (.NET 3.5, .NET 4.0)?


If the problem is just an expression involving monadic and binary operators, parentheses, variable values and constant operands, you can write a recursive descent parser/evaluator that will both parse and evaluate the expression at the same time. No need to build trees or IL or...

[If you need to code more complex syntax such as statements and methods, you'll need a more complex parser, and then a parser generator pays off]

For just an expression, you can code the recursive descent parser directly from a BNF for your expression. Make sure you first left-factor the commonality for each rule, e.g., not

 SUM = TERM '+' TERM ;
 SUM = TERM '-' TERM ;

but

 SUM = TERM ( '+' TERM |  '-' TERM ) ;

For each left hand-side nonterminal, create a (possibly recursive) subroutine taking an index into the string to be parsed, that returns either the value of an expression (assume float), or throws a (syntax) error. For each right hand side token, you code a test; if the token isn't present, it passes control to an alternative, or throws a syntax error if no other alternatives. If the token is present, it evalutes the token: if a terminal value (e.g., number or varable), gets the value and advances the input; if a nonterminal, calls it to see if that syntax is present; if an operator, simply ignores it but advances the input. If your tests match all the elements of a right hand side, compute the expression result and return it.

So for the grammar:

EXP = SUM ;
SUM = TERM ( '+' TERM |  '-' TERM ) ;
TERM = PRIMARY ( '*' PRIMARY | '/' PRIMARY ) ;
PRIMARY = '-' PRIMARY | '(' EXP ')' | NUMBER | VARIABLE ;

And given a buffer of characters INPUT containing the expression, and global variable I an index into INPUT, the code is roughly:

float EXP()
{  return SUM();
}

float SUM()
{  float t=TERM();
   if MATCH("+") return t+TERM(); 
   if MATCH("-") return t-TERM();
   throw SYNTAXERROR;
}

float TERM()
{  float t= PRIMARY();
   if MATCH("*") return t*PRIMARY();
   if MATCH("/") return t/PRIMARY();
   throw SYNTAXERROR;
}

float PRIMARY()
{  float t;
   if MATCH("-") return -PRIMARY();
   if MATCH("(")
       { t=EXP();
         if MATCH(")") return t;
         else throw SYNTAXERROR
       }
   try t=NUMBER();
   catch SYNTAXERROR
      return VARIABLE();
   endtry
}

float NUMBER()  // simple float input conversion
{ float t=0;
  fractiondigits=0;
  exponent=0;
  switch INPUT(I)
  {  case "0".."9":
     { t=t*10+INPUT(I)-"0"; I++;
       while ISDIGIT(INPUT(I))
          { t=t*10+INPUT(I)-"0"; I++ }
       if MATCH(".")
          goto collect_fraction;
       else goto collect_exponent
      }
     case ".": goto collect_fraction;
     default: throw SYNTAXERROR
  }
  collect_fraction:
     while ISDIGIT(INPUT(I))
       { t=t*10+INPUT(I)-"0"; I++; fraction_digits++; }
  collect_exponent:
     if MATCH("E")
       { sign=false;
         if MATCH("-") sign=true;
         if !ISDIGIT(INPUT(I)) throw SYNTAXERROR;
         while ISDIGIT(INPUT(I))
           { exponent=exponent*10+INPUT(I)-"0"; I++; }
         if sign=true then exponenent=-exponent;
       }
       return t*10^(exponent-fractiondigits);
}

float VARIABLE() // handles single letter variable names.
{  if ISLETTER(INPUT(I))
   { I++;
     return VARIABLEVALUE[INPUT(I)-"A"]
   }
   else throw SYNTAXERROR
}

 boolean MATCH(c: char)
 {  if INPUT(I)==c
       {  I++;
          return true;
       }
    else return false

You'll obviously want to write a grammar for the expressions you want to evaluate. But assuming you are adding only relationals and AND, OR and NOT operators, following this style it should take you about 30 minutes to code the whole thing.

This doesn't account for how the input expression got collected into the INPUT buffer, nor does it address the issue of how variables got values; I've assumed that the map of variable names to values got magically filled somehow in advance. If you want to allow simple assignments as well as expressions, just extend the BNF a bit by adding a rule to allow assignments e.g.,

EXP = VARIABLE ':=' EXP ;

Processing the assignment will require a bit of trickiness: as you match the pieces and discover the VARIABLE, you'll need a way to capture the variable name (modify VARAIBLE to remember the variable name in a global), and where the syntax of the assignment rule has been recognized, update the map of variable names to value collected.

The float input code is a hack, and can produce slightly incorrect input values (but it was easy to code :) If you want more precise float input conversions, you should simply collect the characters that make up the float constant, and then hand them to a library string-to-float conversion routine.


If you just need simple expressions like in your example you can use NCalc. Fast and easy to use.


You can take a look at Irony and the SampleExpressionEvaluator in the exemples.

And here's a good article to start with (if you know the basics of defining grammars you'll be ready very quickly):


For simple expressions like the ones given, I would write a recursive descent parser. Write out your BNF first and then just code it out. You can take one of 3 approaches in terms of evaluation:

  1. Emit expression trees
  2. Emit IL
  3. Roll your own bytecode VM

I'd go with (1) as it's the simplest approach.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜