How to portably write std::wstring to file?

2023-01-22 08:43 问答作者：

I have a wstring declared as such:

// random wstring
std::wstring str = L"abcàdëefŸg€hhhhhhhµa";

~~The literal would be UTF开发者_StackOverflow社区-8 encoded, because my source file is.~~

[EDIT: According to Mark Ransom this is not necessarily the case, the compiler will decide what encoding to use - let us instead assume that I read this string from a file encoded in e.g. UTF-8]

I would very much like to get this into a file reading (when text editor is set to the correct encoding)

abcàdëefŸg€hhhhhhhµa

but ofstream is not very cooperative (refuses to take wstring parameters), and wofstream supposedly needs to know locale and encoding settings. I just want to output this set of bytes. How does one normally do this?

EDIT: It must be cross platform, and should not rely on the encoding being UTF-8. I just happen to have a set of bytes stored in a wstring, and want to output them. It could very well be UTF-16, or plain ASCII.

For std::wstring you need std::wofstream

std::wofstream f(L"C:\\some file.txt");
f << str;
f.close();

std::wstring is for something like UTF-16 or UTF-32, not UTF-8. For UTF-8, you probably just want to use std::string, and write out via std::cout. Just FWIW, C++0x will have Unicode literals, which should help clarify situations like this.

Why not write the file as a binary. Just use ofstream with the std::ios::binary setting. The editor should be able to interpret it then. Don't forget the Unicode flag 0xFEFF at the beginning. You might be better of writing with a library, try one of these:

http://www.codeproject.com/KB/files/EZUTF.aspx

http://www.gnu.org/software/libiconv/

http://utfcpp.sourceforge.net/

There is a (Windows-specific) solution that should work for you here. Basically, convert wstring to UTF-8 codepage and then use ofstream.

#include < windows.h >

std::string to_utf8(const wchar_t* buffer, int len)
{
        int nChars = ::WideCharToMultiByte(
                CP_UTF8,
                0,
                buffer,
                len,
                NULL,
                0,
                NULL,
                NULL);
        if (nChars == 0) return "";

        string newbuffer;
        newbuffer.resize(nChars) ;
        ::WideCharToMultiByte(
                CP_UTF8,
                0,
                buffer,
                len,
                const_cast< char* >(newbuffer.c_str()),
                nChars,
                NULL,
                NULL); 

        return newbuffer;
}

std::string to_utf8(const std::wstring& str)
{
        return to_utf8(str.c_str(), (int)str.size());
}

int main()
{
        std::ofstream testFile;

        testFile.open("demo.xml", std::ios::out | std::ios::binary); 

        std::wstring text =
                L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
                L"< root description=\"this is a naïve example\" >\n< /root >";

        std::string outtext = to_utf8(text);

        testFile << outtext;

        testFile.close();

        return 0;
}

C++ has means to perform a conversion from wide character to localized ones on output or file write. Use codecvt facet for that purpose.

You may use standard std::codecvt_byname, or a non-standard codecvt_facet implementation.

#include <locale>
using namespace std;
typedef codecvt_facet<wchar_t, char, mbstate_t> Cvt;
locale utf8locale(locale(), new codecvt_byname<wchar_t, char, mbstate_t> ("en_US.UTF-8"));
wcout.imbue(utf8locale);
wcout << L"Hello, wide to multybyte world!" << endl;

Beware that on some platforms codecvt_byname can only emit conversion only for locales that are installed in the system. I therefore recommend to search stackoverflow for "utf8 codecvt " and make a choice from many referenes of custom codecvt implementations listed.

EDIT: As OP states that the string is already encoded, all he should do is to remove prefixes L and "w" from every token of his code.

Note that wide streams output only char * variables, so maybe you should try using the c_str() member function to convert a std::wstring and then output it to the file. Then it should probably work?

You should not use UTF-8 encoded source file if you want to write portable code. Sorry.

  std::wstring str = L"abcàdëefŸg€hhhhhhhµa";

(I am not sure if this actually hurts the standard, but I think it is. But even if, to be safe you should not.)

Yes, purely using std::ostream will not work. There are many ways to convert a wstring to UTF-8. My favorite is using the International Components for Unicode. It's a big lib, but it's great. You get a lot of extras and things you might need in the future.

I had the same problem some time ago, and wrote down the solution I found on my blog. You might want to check it out to see if it might help, especially the function wstring_to_utf8.

http://pileborg.org/b2e/blog5.php/2010/06/13/unicode-utf-8-and-wchar_t

From my experience of working with different character encodings I would recommend that you only deal with UTF-8 at load and save time. You're in for a world of pain if you try and store the internal representation in UTF-8 since a single character could be anything from 1 byte to 4. So simple operations like strlen require looking at every byte to decide len rather than the allocated buffer (although you can optimize by looking at the first byte in the char sequence, e.g. 00..7f is a single byte char, c2..df indicates a 2 byte char etc).

People quite often refer to 'Unicode strings' when they mean UTF-16 and on Windows a wchar_t is a fixed 2 bytes. In Windows I think wchar_t is simply:

typedef SHORT wchar_t;

The full UTF-32 4 byte representation is rarely required and very wasteful, here what the Unicode Standard (5.0) has to say on it:

"On average more than 99% of all UTF-16 is expressed using single code units... UTF-16 provides the right mix of compact size with the ability to handle the occassional character outside the BMP"

In short, use whcar_t as your internal representation and do conversions when loading and saving (and don't worry about full Unicode unless you know you need it).

With regard to performing the actual conversion have a look at the ICU project:

http://site.icu-project.org/

继续阅读：file unicode wofstream wstring

How to portably write std::wstring to file?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？