开发者

Token Attributes

I have written simple lexical analyzer. And I understand the need to provide each recognized token with attribute. Let's see what I got:

public sealed class Token
{ 
    public enum TokenClass
    { 
        Identifier,
        StringLiteral,
        NumberLiteral,
        Operator,
        PunctuationSeparator,
        Bracket,
        Parenthesis
    }        
    public TokenClass Class { get; internal set; }
    public String     Value { get; internal set; }
}

In lexer I enqueue tokens setting up thier value & class. But what about attributes? How should I design the feature relative to my existing token class?

First tought came into 开发者_StackOverflowmy mind was:

  1. Declare private abstract classes of "ambiguous-entities" (I mean that Number could be Integer and Real and so on) inside token class;
  2. Then declare inherited classes e.g. public class Comma : PunctuationSeparator {};
  3. Add Property Object Attribute {get; private set;};
  4. Then create method like private void ApplyAttribute();
  5. Call ApplyAttribute() when token is instantiated and properties are set;
  6. Use something like this inside ApplyAttribute().

    switch(this.TokenClass)
    {
    case this.TokenClass.Number:
        {
            this.Attribute = (Int32.TryParse(this.Value))? new Integer() : new Real();                
        }
    }
    

In parser it would be easy to write something like that if(CurToken.Attribute is Integer). One thing that stops me from doing like that is number of classes I should create. Is this solution acceptable?


The attributes I'd use for a token? Probably something along the lines of

public class Token
{
  public TokenType Type { get ; private set ; }
  public string    Text { get ; private set ; }
  public int       LineNumber { get ; private set ; }
  public int       Column     { get ; private set ; }
}

public enum TokenType
{
  Keyword : 1 ,
  Integer ,
  String  ,
  Whitespace ,
  Comment ,
  ... 
}

I disagree, though, with the previous poster regarding conversion of the token's text into a 'value'. IMHO, that is the domain of the parser and the nodes of the parse tree. Until the parser has placed the tokens in context WRT the grammar, the token is just a piece of text with a label attached to it. The lexical analyzer doesn't know (and should care) what's happening downstream -- for all it know, the took is pretty-printing the source text (in which case, you want to leave the individual tokens alone).

You might want to take a look at Terrance Parr's book(s):

  • Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages
  • The Definitive ANTLR Reference: Building Domain-Specific Languages


Instead of

public String Value { get; internal set; }

just use

public object Value { get; internal set; }

and then store integer or floating-point values in there as an integer or floating-point value. Then in your parser you can just say

if (token.Value == null)
{
    // blah
}
else if (token.Value is int)
{
    // work with (int) token.Value
}
else if (token.Value is double)
{
    // work with (double) token.Value
}
else if (token.Value is string)
{
    // work with (string) token.Value
}

or alternatively:

int? integer;
double? floating;
string str;

if (token.Value == null)
{
    // blah
}
else if ((integer = token.Value as int?) != null)
{
    // work with integer.Value
}
else if ((floating = token.Value as double?) != null)
{
    // work with floating.Value
}
else if ((str = token.Value as string) != null)
{
    // work with str
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜