开发者

Unicode questions

1.)What is the difference between C# and Java Unicode ?

2.)Is C++ Unicode limited to only Windows applications ?

3.)Do i always have to add specific C++ package of code that differs from usual to be able to use Unicode in C++ ?

4.)What Unicode language is the most supported on all platforms ?

5.)Did Microsoft start this Unicode trend or are th开发者_StackOverflow社区ere any other older Unicode languages besides .NET and Java that supported Unicode from ground up ?


Unicode is not bound to a programming language. You might want to read this to clear things out.


I'm a Japanese developer. I try to answer your questions.

1.)What is the difference between C# and Java Unicode ?

This question is very difficult to answer. When we use UNICODE, there are many aspects that we have to consider, e.g. Font support, Native Code Mapping, Input Method ...

Simple answer is: both C# and Java (and C/C++) use UTF-16 internally. Therefore they are almost same. We have no problem to use UNICODE with them.

2.)Is C++ Unicode limited to only Windows applications ?

C/C++ standard specifies wchar_t as UNICODE character. You can use wchar_t with any C/C++ compilers.

3.)Do i always have to add specific C++ package of code that differs from usual to be able to use Unicode in C++ ?

In order to handle UNICODE correctly, you need to use wide character version libraries. In C, wprintf, wscanf, ... In C++, std::wcout, std::wcin, .... (Visual C++ has a UNICODE compile option. you need to check it before compiling)

4.)What Unicode language is the most supported on all platforms ?

If this means 'UTF-8'/'UTF-16', as I already mentioned, the platform use UTF-16 as internal code. But when an application get data from outside or put data to outside, it may need to convert UTF-16 to UTF-8 or native encoding.

In Japan, we usually use Shift-JIS encoding (one of our native character encoding) on Windows. But recently many utilities (like text editor) support UTF-16/UTF-8, therefore we may not need to convert.

5.)Did Microsoft start this Unicode trend or are there any other older Unicode languages besides .NET and Java that supported Unicode from ground up ?

I think that Windows 2000 would be the first Windows which uses UNICODE internally, Win95/98 use native character encoding (Japanese Win95/98 use Shift-JIS internally).


  1. Unicode is a standard that is independent of C#, Java, or any other programming language.

  2. No.

  3. You don't, if your compiler/system supports unicode, this will work:

    ofstream fout("aaa.txt");
    fout << "Hi, привет\n"; 
    
  4. English.

  5. No. Unicode was supported by C since the dawn of time.

EDIT: See Unicode answers for portable Unicode solutions in C++.


1.)Unicode is language independent
2.)No. see #1
3.) do not know. sorry.
4.)If you mean Unicode version? UTF-8
5.)do not know. sorry.


1.)What is the difference between C# and Java Unicode ?

Unicode is separate from both languages/environments. Both environments support it, and when reading or writing text in one of the Unicode encodings, it's important that they both adhere to the spec from the Unicode Consortium.

2.)Is C++ Unicode limited to only Windows applications ?

No.

3.)Do i always have to add specific C++ package of code that differs from usual to be able to use Unicode in C++ ?

Sorry, don't know this one. (Edit: ebungalobill says no, if the compiler and environment support it and you're using modern constructs.)

4.)What Unicode language is the most supported on all platforms ?

You probably mean what encoding. I don't know, but I expect that a platform either does, or doesn't, support the popular encodings (UTF-8 and UTF-16).

If you mean what human language, it would be English and most other western languages, but support for eastern languages is also robust at this stage.

5.)Did Microsoft start this Unicode trend or are there any other older Unicode languages besides .NET and Java that supported Unicode from ground up ?

Well, Java predates .Net by quite a lot. But Unicode predates them both.

Suggested reading (in order):

  1. http://www.joelonsoftware.com/articles/Unicode.html
  2. http://en.wikipedia.org/wiki/Unicode
  3. http://www.unicode.org/faq/basic_q.html
  4. http://www.unicode.org/faq//utf_bom.html


Unicode is an international standard for assigning numeric values to visible glyphs. There are also standards (UTF-8, UTF-16, etc.) for representing these numeric values using an ordered list of byte values. Note, there are many other standards for assigning numeric values to glyphs, e.g., ASCII, USASCII, various Microsoft CodePage mappings, not to mention mappings for non-english character sets. These values are no programming language dependencies, unicode values are independent of c#, C++, or Java. Unlike earlier languages, like C and C++, unicode representation of characters was built-in to Java, so you don't have to do anything 'special' to support unicode. That said, you still have distinguish between the unicode 'value' for a character and how this 'value' is represented in your data file. E.g., a data file containing ASCII characters can be read into Java Strings but you have to tell Java that you are reading a file containing ASCII mappings not one containing some other mapping. (btw, the UNICODE folks made that easy as the mapping for the first 128 ASCII characters is the same in ASCII and UNICODE and moreover, the representation of these 128 characters in UTF-8 is also equivalent to ASCII - thus, any ASCII data can be treated as UTF-8 encoded UNICODE).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜