开发者

How to tokenize in java without using the java.util tokenizer?

Consider the following as tokens:

  1. +, -, ), (
  2. alpha charactors and underscore
  3. integer

Implement 1.getToken() - returns a string corresponding to the next token 2.getTokPos() - returns the position of the current token in the input string

Example input: (a+b)-21)

Output: (| a| +| b| )| -| 21| )|

Note: Cannot use the java string tokenizer class

Work 开发者_StackOverflowin progress - Successfully tokenized +,-,),(. Need to figure out characters and numbers:

OUTPUT: +|-|+|-|(|(|)|)|)|(| |


java.util tokenizer is a deprecated class.

Tokenizing Strings in Java is much easier with "String.split()" since Java 1.4 :

String[] tokens = "(a+b)-21)".split("[+-)(]");

If it is a homework, you probably have to reimplement a "split" method:

  • read the String character by character
  • if the character is not a special char, add it to a buffer
  • when you encounter a special char, add the buffer content to a list and clear the buffer

Since it is (probably) a homework, I let you implement it.


Java lets you examine the characters in a String one by one with the charAt method. So use that in a for loop and examine each character. When you encounter a TOKEN you wrap that token with the pipes and any other character you just append to the output.

public static final char PLUS_TOKEN = '+';
// add all tokens as 

public String doStuff(String input)
{
    StringBuilder output = new StringBuilder();
    for (int index = 0; index < input.length(); index++)
    {
        if (input.charAt(index) == PLUS_TOKEN)
        {
            // when you see a token you need to append the pipes (|) around it
            output.append('|');
            output.append(input.charAt(index);
            output.append('|');
        }
        else if () //compare the current character with all tokens
        else
        {
            // just add to new output
            output.append(input.charAt(index);
        }

    }
    return output.toString();
}


If it's not a homework assignment use String.split(). If is a homework assignment, say so and tag it so that we can give the appropriate level of help (I did so for you, just in case...).


Because the string needs to be cut in several different ways, not just on whitespace or parens, using the String.split method with any of the symbols there will not work. Split removes the character used as a seperator. You could try to split on the empty string, but this wouldn't get compound symbols, like 21. To correctly parse this string, you will need to effectively implement your own tokenizer. Try thinking about how you could tell you had a complete token if you looked at the string one character at a time. You could probably start a string that collects the characters until you have identified a complete token, and then you can remove the characters from the original and return the string. Starting from this point, you can probably make a basic tokenizer.

If you'd rather learn how to make a full strength tokenizer, most of them are defined by creating a regular expression that only matches the tokens.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜