开发者

Run length encoding of hexadecimal strings including newlines

I am implementing run length encoding using the GZipStream class in a C# winforms app.

Data is provided as a series of strings separated by newline characters, like this:

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

Before compressing, I convert the string to开发者_Go百科 a byte array, but doing so fails if newline characters are present.

Each newline is significant, but I am not sure how to preserve their position in the encoding.

Here is the code I am using to convert to a byte array:

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

Convert.ToByte throws a FormatException if the newlines are not removed, with the info: "Additional non-parsable characters are at the end of the string." Which doesn't surprise me.

What would be the best way to make sure newline characters can be included properly?

Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.

Edit:

I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }


Firstly: are you certain that just compressing the text doesn't give much the same result as compressing the "converted to binary" form?

Assuming you want to go ahead with converting to binary, I can suggest two options:

  • At the start of each line, write a number stating how many bytes are in the line. Then when you decompress, you read and convert that many bytes, then write a newline. If you know that each line is always going to be less than 256 bytes long, you can just represent this as a single byte. Otherwise you might want a larger fixed size, or some variable size encoding (e.g. "while the top bit is set, this is still part of the number") - the latter gets hairy pretty quickly.
  • Alternatively, "escape" a newline by representing it as (say) 0xFF, 0x00. You'd then also need to escape a genuine 0xFF as (say) 0xFF 0xFF. When you read the data, if you read an 0xFF you'd then read the next byte to determine whether it represented a newline or a genuine 0xFF.

EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of GZipStream is not text, and shouldn't be treated as if it were text using Encoding. However, you can turn it into ASCII text very easily, by calling Convert.ToBase64String. By the way, another trick you've missed is to call ToArray on the MemoryStream, which will give you the contents as a byte[] with no extra messing around.


If the data you posted is representative of all the data, then you have a newline every 4 bytes, so if you need it when converting back, just stick one in every 4 bytes of data

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜