Token Attributes

2023-02-04 11:42 问答作者：

I have written simple lexical analyzer. And I understand the need to provide each recognized token with attribute. Let's see what I got:

public sealed class Token
{ 
    public enum TokenClass
    { 
        Identifier,
        StringLiteral,
        NumberLiteral,
        Operator,
        PunctuationSeparator,
        Bracket,
        Parenthesis
    }        
    public TokenClass Class { get; internal set; }
    public String     Value { get; internal set; }
}

In lexer I enqueue tokens setting up thier value & class. But what about attributes? How should I design the feature relative to my existing token class?

First tought came into 开发者_StackOverflowmy mind was:

Declare private abstract classes of "ambiguous-entities" (I mean that Number could be Integer and Real and so on) inside token class;
Then declare inherited classes e.g. public class Comma : PunctuationSeparator {};
Add Property Object Attribute {get; private set;};
Then create method like private void ApplyAttribute();
Call ApplyAttribute() when token is instantiated and properties are set;

Use something like this inside ApplyAttribute().

switch(this.TokenClass)
{
case this.TokenClass.Number:
    {
        this.Attribute = (Int32.TryParse(this.Value))? new Integer() : new Real();                
    }
}

In parser it would be easy to write something like that if(CurToken.Attribute is Integer). One thing that stops me from doing like that is number of classes I should create. Is this solution acceptable?

The attributes I'd use for a token? Probably something along the lines of

public class Token
{
  public TokenType Type { get ; private set ; }
  public string    Text { get ; private set ; }
  public int       LineNumber { get ; private set ; }
  public int       Column     { get ; private set ; }
}

public enum TokenType
{
  Keyword : 1 ,
  Integer ,
  String  ,
  Whitespace ,
  Comment ,
  ... 
}

I disagree, though, with the previous poster regarding conversion of the token's text into a 'value'. IMHO, that is the domain of the parser and the nodes of the parse tree. Until the parser has placed the tokens in context WRT the grammar, the token is just a piece of text with a label attached to it. The lexical analyzer doesn't know (and should care) what's happening downstream -- for all it know, the took is pretty-printing the source text (in which case, you want to leave the individual tokens alone).

You might want to take a look at Terrance Parr's book(s):

Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages
The Definitive ANTLR Reference: Building Domain-Specific Languages

Instead of

public String Value { get; internal set; }

just use

public object Value { get; internal set; }

and then store integer or floating-point values in there as an integer or floating-point value. Then in your parser you can just say

if (token.Value == null)
{
    // blah
}
else if (token.Value is int)
{
    // work with (int) token.Value
}
else if (token.Value is double)
{
    // work with (double) token.Value
}
else if (token.Value is string)
{
    // work with (string) token.Value
}

or alternatively:

int? integer;
double? floating;
string str;

if (token.Value == null)
{
    // blah
}
else if ((integer = token.Value as int?) != null)
{
    // work with integer.Value
}
else if ((floating = token.Value as double?) != null)
{
    // work with floating.Value
}
else if ((str = token.Value as string) != null)
{
    // work with str
}

继续阅读：architecture token

Token Attributes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？