finding unicode for non-english characters

2022-12-17 03:42 问答作者：

I have to print a non-english string in a Java program. I have the string with me. How do I get the unicode of its constituent characters so 开发者_Python百科that I am embed the string within the program?

In which codepage do you have that string? Java sources can be in any encoding, so you can put that string right in the source and use compiler's options to set the code page. See NetBeans -> Project node -> Properties -> Source -> Encoding.

The source files were getting encoded using "MacRoman" (found this from Project Properties -> Resource -> Text file encoding). I changed it to "UTF-8" and then tried embedding the actual non-english string to the program and tried printing. it worked.

You were perhaps corrupting data either on save or during compilation. Source code doesn't carry any intrinsic encoding information, so it is easy to corrupt string literals that contain characters outside the basic "ASCII" range. Consider using Unicode escape sequences in your source files to avoid this problem. You either do that or you ensure that anyone who comes into contact with the source handles it appropriately at all times - the first way is easier.

If this is for a commercial application, consider externalizing the strings to a resource file.

Java: a rough guide to character encoding
Java: character inspector application

As previous answers said, you can definitely write strings containing characters that can't be encoded in conventional ISO-8859-1 or US-ASCII characters sets, directly in the source file. You do need to make sure your IDE saves the file as UTF-8. And, you may need to add "-encoding UTF-8" to your javac command to ensure javac reads it correctly.

But I think you're wondering about how to embed the string using "\uXXXX" syntax, perhaps to avoid any issues of the source file encoding. This short code snippet will probably work for you; it crudely assumes any character whose UTF-16 values is over 255 needs to be escaped.

public static void main(String[] args) {
  String s = args[0];
  for (int i = 0; i < s.length(); i++) {
    char c = s.charAt(i);
    int value = (int) c;
    if (value < 256) {
      System.out.print(c);
    } else {
      System.out.print("\\u" + Integer.toHexString(value));
    }
  }
}

python -c "print repr('text goes here'.decode('utf-8'))"

It may not always be 'utf-8', but that is a sane starting point.

继续阅读：unicode

finding unicode for non-english characters

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？