开发者

Writing a Tokenizer class in C# for a BASIC Interpreter

For a bit of fun, I'm attempting to write a BASIC interpreter in C#. Following is my Tokenizer class (with just a few keywords). I'm after any suggestions or comments... Code clarity is more important to me than efficiency. Many thanks.

class Tokenizer
    {

        const string Operators = "+-/*%<>=&|";
        private List<string> Keywords = new List<string>{"LET", "DIM", "PRINT", "REM"};

        private List<string> tokens = new List<string>();
        private List<string> tokenTypes = new List<string>();

        private int tokenIndex;

        // Turn command string into tokens        
        public void Tokenize(string cmdLine)
        {
            string token = "";
            char lastc = ' ';
            bool inString = false;

            tokens.Clear();
            tokenTypes.Clear();

            // Step through line and split into tokens
            foreach (char c in cmdLine)
            {
                if (c == '"') inString = !inString;

                if (!inString)
                {
                    if (IsOperator(lastc)) AddToken(ref token);
                    if (IsWhitespace(c)) AddToken(ref token);
                    if (IsOperator(c)) AddToken(ref token);
                    if (IsNumber(c) && !IsNumber(lastc)) AddToken(ref token);

                    if (!IsWhitespace(c)) token += c;
                }
                else
                    token += c;

                lastc = c;
            }

            // Add last token
            AddToken(ref token);

            tokenIndex = 0;

        }

        public string Token()
        {
            return tokens[tokenIndex];
        }

        public string TokenType()
        {
            return tokenTypes[tokenIndex];
        }

        public void NextToken()
        {            
           tokenIndex++;            
        }

        public bool TokensLeft()
        {
            return tokenIndex < tokens.Count;
        }

        // Add a token to the collection
        private void AddToken(ref string token)
        {
         开发者_开发知识库   if (token.Trim() != "")
            {
                // Determine token type
                string tokenType = "Identifier";
                if (IsOperator(token[0])) tokenType = "Operator";
                if (IsNumber(token[0])) tokenType = "Number";
                if (token[0] == '"') tokenType = "String";
                if (Keywords.Contains(token.ToUpper())) tokenType = "Keyword";

                tokens.Add(token);
                tokenTypes.Add(tokenType);
                token = "";
            }
        }

        private bool IsWhitespace(char c)
        {
            return (c.ToString() != c.ToString().Trim());
        }

        private bool IsOperator(char c)
        {
            return Operators.Contains(c);
        }

        private bool IsNumber(char c)
        {
            return Char.IsNumber(c);
        }
    }


You usually never want to write a parser code like that manually, learning a good parser generator tool such as Antlr is a good investment of your time if you're going to handle parsing of computer languages for more than just fun/coding exercise. That being said, if you really want to do this manually, you have some things to think about:

  • Use a StringBuilder instead of a string
  • Just checking for quotes is not enough, what about escaped quotes?
  • How will you handle floating point numbers (in all formats)?
  • How will you handle identifiers that include numbers?

Those are a few issues you'll run into, again I really recommend learning a parser generator tool, it makes this kind of thing much more fun (not to mention correct and efficient).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜