Is there any need for me to use wstring in the following case

2022-12-23 15:47 问答作者：

Currently, I am developing an app for a China customer. China customer are mostly switch to GB2312 language in their OS encoding. I need to write a text file, which will be encoded using GB2312.

I use std::ofstream file
I compile my application under MBCS mode, not unicode.
I use the following code, to convert CString to std::string, and write it to file using ofstream

std::string Utils::ToString(CString& cString) {
    /* Will not work correctly, if we are compiled under unicode mode. */
    return (LPCTSTR)cString;
}

To my surprise. It just works. I thought I need to at least make use of wstring. I try to do some investigation.

Here is the MBCS.txt generated.

alt text http://sites.google.com/site/yanchengcheok/Home/stackoverflow0.PNG

I try to print a single character named 脚 (its value is 0xBDC5)
When I use CString to carry this character, its length is 2.
When I use Utils::ToString to perform conversion to std::string, the returned string length is 2.
I write to file using std::ofstream

My question is :

When I exam MBCS.txt using a hex editor, the value is displayed as BD (LSB) and 开发者_如何学运维C5 (MSB). But I am using little endian machine. Isn't hex editor should show me C5 (LSB) and BD (MSB)? I check from wikipedia. GB2312 seems doesn't specific endianness.
It seems that using std::string + CString just work fine for my case. May I know in what case, the above methodology will not work? and when I should start to use wstring?

About 1. Endianness is a problem you meet when you serialize a unit in term of smaller units (i.e. serialize seizets in term of octets). I'm far from being a specialist of CJK encodings, but it seems to me that GB2112 is a coded character set which can be used with several encoding schemes. The encoding schemes cited in the wikipedia page as being used for GB2112 (ISO 2022, EUC-CN and HZ) are all defined in terms of octets. So there is no endianness issue if serialized as octets.

Contrast this with Unicode encoding schemes: UTF-8 is defined in terms of octets and has no endianness issue when serialized as octets, UTF-16 is defined in terms of seizets and if serialized as octets endianness must be specified, UTF-32 is defined in terms of 32 bits units and if serialized as octets endianness must be specified.

继续阅读：mfc unicode

Is there any need for me to use wstring in the following case

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？