Convert International String to \u Codes in java

2023-03-10 11:16 问答作者：

How can I convert an international (e.g. Russian) String to \u numbers (unicode num开发者_开发技巧bers)

e.g. \u041e\u041a for OK ?

there is a JDK tools executed via command line as following :

native2ascii -encoding utf8 src.txt output.txt

Example :

src.txt

بسم الله الرحمن الرحيم

output.txt

\u0628\u0633\u0645 \u0627\u0644\u0644\u0647 \u0627\u0644\u0631\u062d\u0645\u0646 \u0627\u0644\u0631\u062d\u064a\u0645

If you want to use it in your Java application, you can wrap this command line by :

String pathSrc = "./tmp/src.txt";
String pathOut = "./tmp/output.txt";
String cmdLine = "native2ascii -encoding utf8 " + new File(pathSrc).getAbsolutePath() + " " + new File(pathOut).getAbsolutePath();
Runtime.getRuntime().exec(cmdLine);
System.out.println("THE END");

Then read content of the new file.

You could use escapeJavaStyleString from org.apache.commons.lang.StringEscapeUtils.

I also had this problem. I had some Portuguese text with some special characters, but these characters where already in unicode format (ex.: \u00e3).

So I want to convert S\u00e3o to São.

I did it using the apache commons StringEscapeUtils. As @sorin-sbarnea said. Can be downloaded here.

Use the method unescapeJava, like this:

String text = "S\u00e3o"
text = StringEscapeUtils.unescapeJava(text);
System.out.println("text " + text);

(There is also the method escapeJava, but this one puts the unicode characters in the string.)

If any one knows a solution on pure Java, please tell us.

Here's an improved version of ArtB's answer:

    StringBuilder b = new StringBuilder();

    for (char c : input.toCharArray()) {
        if (c >= 128)
            b.append("\\u").append(String.format("%04X", (int) c));
        else
            b.append(c);
    }

    return b.toString();

This version escapes all non-ASCII chars and works correctly for low Unicode code points like Ä.

There are three parts to the answer

Get the Unicode for each character
Determine if it is in the Cyrillic Page
Convert to Hexadecimal.

To get each character you can iterate through the String using the charAt() or toCharArray() methods.

for( char c : s.toCharArray() )

The value of the char is the Unicode value.

The Cyrillic Unicode characters are any character in the following ranges:

Cyrillic:            U+0400–U+04FF ( 1024 -  1279)
Cyrillic Supplement: U+0500–U+052F ( 1280 -  1327)
Cyrillic Extended-A: U+2DE0–U+2DFF (11744 - 11775)
Cyrillic Extended-B: U+A640–U+A69F (42560 - 42655)

If it is in this range it is Cyrillic. Just perform an if check. If it is in the range use Integer.toHexString() and prepend the "\\u". Put together it should look something like this:

final int[][] ranges = new int[][]{ 
        {  1024,  1279 }, 
        {  1280,  1327 }, 
        { 11744, 11775 }, 
        { 42560, 42655 },
    };
StringBuilder b = new StringBuilder();

for( char c : s.toCharArray() ){
    int[] insideRange = null;
    for( int[] range : ranges ){
        if( range[0] <= c && c <= range[1] ){
            insideRange = range;
            break;
        }
    }

    if( insideRange != null ){
        b.append( "\\u" ).append( Integer.toHexString(c) );
    }else{
        b.append( c );
    }
}

return b.toString();

Edit: probably should make the check c < 128 and reverse the if and the else bodies; you probably should escape everything that isn't ASCII. I was probably too literal in my reading of your question.

There's a command-line tool that ships with java called native2ascii. This converts unicode files to ASCII-escaped files. I've found that this is a necessary step for generating .properties files for localization.

In case you need this to write a .properties file you can just add the Strings into a Properties object and then save it to a file. It will take care for the conversion.

Apache commons StringEscapeUtils.escapeEcmaScript(String) returns a string with unicode characters escaped using the \u notation.

"Art of Beer

继续阅读：escapingunicodeunicode-escapes


                            更多精彩内容
                            Java中如何正确的停掉线程
Spring Cache 整合 Redis 实现高效缓存的方法
深度解析Java @Serial 注解及常见错误案例
Spring MVC处理HTTP状态码、响应头和异常的完整示例
QT Creator配置Kit的实现示例

Convert International String to \u Codes in java

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？