开发者

_wsopen_s with _O_U8TEXT flag, returns 0 to buffer, inbetween chars, and 4 inbetween russian chars. VS2010

If I input a UTF-8 encoded file like,

example.html

<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-开发者_Go百科8">
<title>Текст на русском</title>

Where "Текст на русском" - Is text in russian

#include <string>
#include <ios>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <io.h>
#include <stdio.h>

using namespace std;
int main () 
{
int fl; unsigned int nbytes = 60000,bspr; char buf [60000];
errno_t err = _wsopen_s(&fl,L"c:\\example.html", _O_U8TEXT,_SH_DENYNO,_S_IREAD | _S_IWRITE ); // &fh,"c:\\example.html",_O_RDONLY, 
if ( err!=0 ) exit (1);
if ((bspr = _read(fl,buf,nbytes))<=0 )
{
    perror (" Error opening file ");
    exit (1);
}

}

I get buf[0]=60 '<', buf[1]=0, buf[2]=104 'h',buf[3]=0, and so on

until i reach russian letters, then i get totally improper symbols like 20 '' each followed by 4 '',

'char' - is the vstudio output of this character .. strangely same for 20 and 4.

So the question is - Is there any way I can get output buffer to a string till EOF, formatted properly , even if not using this operator ?


It looks like _O_U8TEXT causes _read to convert from UTF-8 to UTF-16. You should probably be reading using high-level Unicode functions like getwc when opening a stream in a unicode mode. You could use _wfopen_s with L"rt, ccs=UTF-8", or if you need the sharing support you could use your existing _wsopen_s call followed by _wfdopen.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜