开发者

c++ fread jibberish

For some reason my buffer is getting filled with jibberish, and I'm not sure why. I even checked my file with a hex editor to verify that my characters are saved in a 2 byte unicode format. I'm not sure what's wrong.

[on file open]

fseek(_file_pointer, 0, SEEK_END);
this->_length = ftell(this->_file_pointer) / sizeof(chr);

[Main]

//there is a reason for this, I just 
//didn't include the code that tells why
typedef wchar_t chr;
chr *buffer = (chr*)malloc(f->_length*sizeof(chr));
if(buffer == NULL)return;
memset(buffer,0,f->_len开发者_C百科gth*sizeof(chr));
f->Read_Whole_File(buffer);
f->Close();
free(buffer);

[Read_Whole_File]

void Read_Whole_File(chr *buffer)
{
    if(buffer == NULL)
    {
        this->_IsError = true;
        return;
    }
    fseek(this->_file_pointer, 0, SEEK_SET);
    int a = sizeof(buffer[0]);//for debugging purposes  
    fread(buffer, a, _length, this->_file_pointer); 
}


Assuming your error handling (that you said you've omitted here) is sound, I see two reasons that may be the cause of the problem:

  1. First of all, wchar_t may not necessarily be 2 bytes, its size is implementation defined. For example on Linux it's most likely 4 bytes.

  2. It may be that the file is UTF-16BE (big-endian), and you are running on a little-endian platform, so the wchar_t values in your buffer have their byte order swapped.

Or, it may be both. Please update your question with some details about your platform and a few bytes from the sample file in hex (if possible).

In any case, you should not make any assumptions about sizes of standard C or C++ types when dealing with Unicode files.

For example, If you want to read UTF16-BE, use C99 uint16_t type (or an equivalent type that's guaranteed to be 16-bit), and swap byte order of your input depending on your platform endian-ness and file endian-ness. You can detect file endian-ness using a byte order mark if it's present in the file.

Alternatively, use a third-part Unicode library, like ICU. It takes care of all platform-specific details and will save you a lot of time debugging in a sizable project.


The signature of fread is:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

Where size is the size of each element, and nmemb is the number of elements. In your case, size is sizeof(chr) and nmemb is the length of the buffer in characters.


If you are in C++, why not use a std::fstream?

Apart from that, you use unicode, note that c and c++ are seriously lacking in their standard unicode support. The answers here might help you read these unicode files.

But I must stress again, if you are using c++, use the STL. Also, check the excellent answer to this question: std::wstring VS std::string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜