_wsopen_s with _O_U8TEXT flag, returns 0 to buffer, inbetween chars, and 4 inbetween russian chars. VS2010
If I input a UTF-8 encoded file like,
example.html
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-开发者_Go百科8">
<title>Текст на русском</title>
Where "Текст на русском" - Is text in russian
#include <string>
#include <ios>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <io.h>
#include <stdio.h>
using namespace std;
int main ()
{
int fl; unsigned int nbytes = 60000,bspr; char buf [60000];
errno_t err = _wsopen_s(&fl,L"c:\\example.html", _O_U8TEXT,_SH_DENYNO,_S_IREAD | _S_IWRITE ); // &fh,"c:\\example.html",_O_RDONLY,
if ( err!=0 ) exit (1);
if ((bspr = _read(fl,buf,nbytes))<=0 )
{
perror (" Error opening file ");
exit (1);
}
}
I get buf[0]=60 '<', buf[1]=0, buf[2]=104 'h',buf[3]=0, and so on
until i reach russian letters, then i get totally improper symbols like 20 '' each followed by 4 '',
'char' - is the vstudio output of this character .. strangely same for 20 and 4.
So the question is - Is there any way I can get output buffer to a string till EOF, formatted properly , even if not using this operator ?
It looks like _O_U8TEXT
causes _read
to convert from UTF-8 to UTF-16. You should probably be reading using high-level Unicode functions like getwc
when opening a stream in a unicode mode. You could use _wfopen_s
with L"rt, ccs=UTF-8"
, or if you need the sharing support you could use your existing _wsopen_s
call followed by _wfdopen
.
精彩评论