Convert International String to \u Codes in java
How can I convert an international (e.g. Russian) String to \u numbers (unicode num开发者_开发技巧bers)
\u041e\u041a for OK ?there is a JDK tools executed via command line as following :
native2ascii -encoding utf8 src.txt output.txt
Example :
src.txt
بسم الله الرحمن الرحيم
output.txt
\u0628\u0633\u0645 \u0627\u0644\u0644\u0647 \u0627\u0644\u0631\u062d\u0645\u0646 \u0627\u0644\u0631\u062d\u064a\u0645
If you want to use it in your Java application, you can wrap this command line by :
String pathSrc = "./tmp/src.txt";
String pathOut = "./tmp/output.txt";
String cmdLine = "native2ascii -encoding utf8 " + new File(pathSrc).getAbsolutePath() + " " + new File(pathOut).getAbsolutePath();
Runtime.getRuntime().exec(cmdLine);
System.out.println("THE END");
Then read content of the new file.
You could use escapeJavaStyleString from org.apache.commons.lang.StringEscapeUtils.
I also had this problem. I had some Portuguese text with some special characters, but these characters where already in unicode format (ex.: \u00e3).
So I want to convert S\u00e3o to São.
I did it using the apache commons StringEscapeUtils. As @sorin-sbarnea said. Can be downloaded here.
Use the method unescapeJava, like this:
String text = "S\u00e3o"
text = StringEscapeUtils.unescapeJava(text);
System.out.println("text " + text);
(There is also the method escapeJava, but this one puts the unicode characters in the string.)
If any one knows a solution on pure Java, please tell us.
Here's an improved version of ArtB's answer:
    StringBuilder b = new StringBuilder();
    for (char c : input.toCharArray()) {
        if (c >= 128)
            b.append("\\u").append(String.format("%04X", (int) c));
        else
            b.append(c);
    }
    return b.toString();
This version escapes all non-ASCII chars and works correctly for low Unicode code points like Ä.
There are three parts to the answer
- Get the Unicode for each character
- Determine if it is in the Cyrillic Page
- Convert to Hexadecimal.
To get each character you can iterate through the String using the charAt() or toCharArray() methods.
for( char c : s.toCharArray() )
The value of the char is the Unicode value.
The Cyrillic Unicode characters are any character in the following ranges:
Cyrillic:            U+0400–U+04FF ( 1024 -  1279)
Cyrillic Supplement: U+0500–U+052F ( 1280 -  1327)
Cyrillic Extended-A: U+2DE0–U+2DFF (11744 - 11775)
Cyrillic Extended-B: U+A640–U+A69F (42560 - 42655)
If it is in this range it is Cyrillic. Just perform an if check. If it is in the range use Integer.toHexString() and prepend the "\\u". Put together it should look something like this:
final int[][] ranges = new int[][]{ 
        {  1024,  1279 }, 
        {  1280,  1327 }, 
        { 11744, 11775 }, 
        { 42560, 42655 },
    };
StringBuilder b = new StringBuilder();
for( char c : s.toCharArray() ){
    int[] insideRange = null;
    for( int[] range : ranges ){
        if( range[0] <= c && c <= range[1] ){
            insideRange = range;
            break;
        }
    }
    if( insideRange != null ){
        b.append( "\\u" ).append( Integer.toHexString(c) );
    }else{
        b.append( c );
    }
}
return b.toString();
Edit: probably should make the check c < 128 and reverse the if and the else bodies; you probably should escape everything that isn't ASCII. I was probably too literal in my reading of your question.
There's a command-line tool that ships with java called native2ascii. This converts unicode files to ASCII-escaped files. I've found that this is a necessary step for generating .properties files for localization.
In case you need this to write a .properties file you can just add the Strings into a Properties object and then save it to a file. It will take care for the conversion.
Apache commons StringEscapeUtils.escapeEcmaScript(String) returns a string with unicode characters escaped using the \u notation.
"Art of Beer  
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论