开发者

How do I translate strings using Java?

I want a translation routine that allows me to translate any character to any other character or set of characters ef开发者_StackOverflowficiently. The obvious way seems to be to use the value of a character from the input string as an index into a 256-entry translation array.

Given an initial array where each entry is set to its value, e.g. hex'37' would appear in the 56th entry (allowing 00 to be the first), the user could then substitute any characters required in the translate string.

e.g.1 I want to map a string with "A" for alphabetic characters, "N" for numeric characters, "B" for space characters and "X" for anything else. Thus "SL5 3QW" becomes "AANBNAA".

e.g.2. I want to translate some characters, such as "œ" (x'9D') to "oe" (x'6F65'), "ß" to "ss", "å" to "a", etc.

How do I get a numeric value from a character in the input string to use it as an index into the translate array?

It's easy with function CODE in Excel and straightforward in IBM assembler, but I can't track down a method in Java.


This is a bit off topic, but if you want to do a comprehensive job of character translation, you cannot simply use String.charAt(int). Unicode codepoints larger than 65535 are represented in Java Strings as two consecutive char values.

The clean way to deal with this is to use the String.codepointAt(int) to extract each codepoint, and String.offsetByCodePoints(int, int) to step through the codepoint positions.


The latest version of Unicode contains over 107000 characters. A 256-entry translation array won't cut it.

That said, you can get the codepoint at an index in a string using the String.codepointAt(int index) method.

You might also want to use Character.isWhitespace(int codepoint) and Character.isDigit(int codepoint) and so on.

See also http://download.oracle.com/javase/6/docs/api/java/lang/String.html and http://download.oracle.com/javase/6/docs/api/java/lang/Character.html


HashMap<String, String> should work just fine. No need to over-engineer such a simple problem.


As Christoffer says, with Unicode characters a 256-element array is not enough.

One way is to use a HashMap<Character,String> mapping each Character to the desired translated value, and use String.charAt() to extract each character in turn. You might also look at some of the methods on the Character class like isDigit() and isLetter() to do some of the work; that might be easier than constructing a mapping for every "letter" (in multiple languages, perhaps).

By using a HashMap, you only need to define mappings for the characters you wish to translate. For ones that don't have a mapping (hashmap returns null) you could either specify a default value or pass them through unchanged.


There are different ways to answer this question. The easiest way is probably to come up with answers for each of the problems individually:


Problem 1:

e.g.1 I want to map a string with "A" for alphabetic characters, "N" for numeric characters, "B" for space characters and "X" for anything else. Thus "SL5 3QW" becomes "AANBNAA".

Simple solution:

public static String map(final String input){
    final char[] out = new char[input.length()];
    for(int i = 0; i < input.length(); i++){
        final char c = input.charAt(i);
        final char t;
        if(Character.isDigit(c)){
            t = 'N';
        } else if(Character.isWhitespace(c)){
            t = 'B';
        } else if(Character.isLetter(c)){
            t = 'A';
        } else{
            t = 'X';
        }
        out[i] = t;
    }
    return new String(out);
}

Test:

public static void main(final String[] args){
    System.out.println(map("SL5 3QW"));
}

Output:

AANBNAA


Problem 2:

e.g.2. I want to translate some characters, such as "œ" (x'9D') to "oe" (x'6F65'), "ß" to "ss", "å" to "a", etc.

Solution:

This is standard functionality, you should use the Normalizer API for this. See these previous answers for reference.


The Big Picture

But on second thought there is of course a more general solution to your problem. Let's see how many downvotes I get for this one by the if/else lovers. Define an interface of a transformer that accepts certain characters and / or character classes and maps them to other characters:

public interface CharTransformer{
    boolean supports(char input);
    char transform(char input);
}

And now define a method that you can call with a string and a collection of such transformers. For every single character, each transformer will be queried to see if he supports this character. If he does, let him do the transformation. If no Transformer is found for a character, throw an exception.

public static String mapWithTransformers(final String input,
    final Collection<? extends CharTransformer> transformers){
    final char[] out = new char[input.length()];
    for(int i = 0; i < input.length(); i++){
        final char c = input.charAt(i);
        char t = 0;
        boolean matched = false;
        for(final CharTransformer tr : transformers){
            if(tr.supports(c)){
                matched = true;
                t = tr.transform(c);
                break;
            }
        }
        if(!matched){
            throw new IllegalArgumentException("Found no Transformer for char: "
                + c);
        }
        out[i] = t;
    }
    return new String(out);
}

One more thing: Maps

Note: Others have suggested using a Map. While I don't think a standard map is good for this task, you could use Guava's MapMaker.makeComputingMap(function) to calculate the replacements as needed (and automatically cache them). That way you have a lazily initialized caching map.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜