开发者

Visual Studio C++ 2008 Manipulating Bytes?

I'm trying to write strictly binary data to files (no encoding). The problem is, when I hex dump the files, I'm noticing rather weird behavior. Using either one of the below methods to construct a file results in the same behavior. I even used the System::Text::Encoding::Default to test as well for the streams.

StreamWriter^ binWriter = gcnew StreamWriter(gcnew FileStream("test.bin",FileMode::Create));

(Also used this method)
FileStream^ tempBin = gcnew FileStream("test.bin",FileMode::Create);
BinaryWriter^ binWriter = gcnew BinaryWriter(tempBin);


binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);
.
.
b开发者_开发问答inWriter->Write(0x9F);

Writing that sequence of bytes, I noticed the only bytes that weren't converted to 0x3F in the hex dump were 0x81,0x8D,0x90,0x9D, ... and I have no idea why.

I also tried making character arrays, and a similar situation happens. i.e.,

array<wchar_t,1>^ OT_Random_Delta_Limits = {0x00,0x00,0x03,0x79,0x00,0x00,0x04,0x88};
binWriter->Write(OT_Random_Delta_Limits);

0x88 would be written as 0x3F.


If you want to stick to binary files then don't use StreamWriter. Just use a FileStream and Write/WriteByte. StreamWriters (and TextWriters in generally) are expressly designed for text. Whether you want an encoding or not, one will be applied - because when you're calling StreamWriter.Write, that's writing a char, not a byte.

Don't create arrays of wchar_t values either - again, those are for characters, i.e. text.

BinaryWriter.Write should have worked for you unless it was promoting the values to char in which case you'd have exactly the same problem.

By the way, without specifying any encoding, I'd expect you to get non-0x3F values, but instead the bytes representing the UTF-8 encoded values for those characters.

When you specified Encoding.Default, you'd have seen 0x3F for any Unicode values not in that encoding.

Anyway, the basic lesson is to stick to Stream when you want to deal with binary data rather than text.

EDIT: Okay, it would be something like:

public static void ConvertHex(TextReader input, Stream output)
{
    while (true)
    {
        int firstNybble = input.Read();
        if (firstNybble == -1)
        {
            return;
        }
        int secondNybble = input.Read();
        if (secondNybble == -1)
        {
            throw new IOException("Reader finished half way through a byte");
        }
        int value = (ParseNybble(firstNybble) << 4) + ParseNybble(secondNybble);
        output.WriteByte((byte) value);
    }
}

// value would actually be a char, but as we've got an int in the above code,
// it just makes things a bit easier
private static int ParseNybble(int value)
{
    if (value >= '0' && value <= '9') return value - '0';
    if (value >= 'A' && value <= 'F') return value - 'A' + 10;
    if (value >= 'a' && value <= 'f') return value - 'a' + 10;
    throw new ArgumentException("Invalid nybble: " + (char) value);
}

This is very inefficient in terms of buffering etc, but should get you started.


A BinaryWriter() class initialized with a stream will use a default encoding of UTF8 for any chars or strings that are written. I'm guessing that the

binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);

calls are binding to the Write( char) overload so they're going through the character encoder. I'm not very familiar with C++/CLI, but it seems to me that these calls should be binding to Write(Int32), which shouldn't have this problem (maybe your code is really calling Write() with a char variable that's set to the values in your example. That would account for this behavior).


0x3F is commonly known as the ASCII character '?'; the characters that are mapping to it are control characters with no printable representation. As Jon points out, use a binary stream rather than a text-oriented output mechanism for raw binary data.

EDIT -- actually your results look like the inverse of what I would expect. In the default code page 1252, the non-printable characters (i.e. ones likely to map to '?') in that range are 0x81, 0x8D, 0x8F, 0x90 and 0x9D

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜