开发者

utfcpp and Win32 wide API

Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8?

I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal ASCII characters work of course, but ñä gets messed up.

Or is there an easier alternative?

Thanks!

UPDATE: Thanks to Hans (below), I now have an easy UTF8<->UTF16 conversion through the Windows API. Two way conversion works, but the UTF8 from UTF16 string has some extra characters that might cause me some trouble later on...). I'll share it here out of pure friendliness :) ):

// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
    // get length
    int length = WideCharToMultiByte( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0,
                                      NULL, NULL );
    if( !(length > 0) )
        return std::string();
    else
    {
        std::string result;
        result.resize( length );

        if( WideCharToMultiByte( CP_UTF8, NULL,
                                 input.c_str(), input.size(),
                                 &result[0], result.size(),
                                 NULL, NULL ) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
    }
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
    // get length
    int length = MultiByteToWideChar( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                   开发者_如何学C                   NULL, 0 );
    if( !(length > 0) )
        return std::wstring();
    else
    {
        std::wstring result;
        result.resize( length );

        if( MultiByteToWideChar(CP_UTF8, NULL,
                                input.c_str(), input.size(),
                                &result[0], result.size()) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
    }
}


The Win32 API already has a function to do this, WideCharToMultiByte() with CodePage = CP_UTF8. Saves you from having to rely on another library.

You cannot normally use the result with wcout. Its output goes to the console, it uses an 8-bit OEM encoding for legacy reasons. You can change the code page with SetConsoleCP(), 65001 is the code page for UTF-8 (CP_UTF8).

Your next stumbling block would be the font that's used for the console. You'll have to change it but finding a font that's fixed-pitch and has a full set of glyphs to cover Unicode is going to be difficult. You'll see you have a font problem when you get square rectangles in the output. Question marks are encoding problems.


Why do you want to use UTF8 internally? Are you working with so much text that using UTF16 would create unreasonable memory demands? Even if that was the case, you're probably better off using wide chars anyway, and dealing with memory issues in some other way (using a disk cache, better algorithms or data structures).

Your code will be much cleaner and easier to deal with using wide chars native to the Win32 API internally, and only doing UTF8 conversions when reading or writing out data that requires it (eg. XML files or REST APIs).

Your problem may also occur at the point where you print your output to the console, see: Output unicode strings in Windows console app

Finally I haven't used the utfcpp library, but UTF8 conversions are fairly trivial to perform using Win32's WideCharToMultiByte and MultiByteToWideChar with CP_UTF8 as the code page. Personally I would do a one time conversion and work with the text in UTF16 until it was time to output or transfer it in UTF8 if needed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜