Run length encoding of hexadecimal strings including newlines

2023-01-11 08:29 问答作者：

I am implementing run length encoding using the GZipStream class in a C# winforms app.

Data is provided as a series of strings separated by newline characters, like this:

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

Before compressing, I convert the string to开发者_Go百科 a byte array, but doing so fails if newline characters are present.

Each newline is significant, but I am not sure how to preserve their position in the encoding.

Here is the code I am using to convert to a byte array:

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

Convert.ToByte throws a FormatException if the newlines are not removed, with the info: "Additional non-parsable characters are at the end of the string." Which doesn't surprise me.

What would be the best way to make sure newline characters can be included properly?

Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.

Edit:

I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }

Firstly: are you certain that just compressing the text doesn't give much the same result as compressing the "converted to binary" form?

Assuming you want to go ahead with converting to binary, I can suggest two options:

At the start of each line, write a number stating how many bytes are in the line. Then when you decompress, you read and convert that many bytes, then write a newline. If you know that each line is always going to be less than 256 bytes long, you can just represent this as a single byte. Otherwise you might want a larger fixed size, or some variable size encoding (e.g. "while the top bit is set, this is still part of the number") - the latter gets hairy pretty quickly.
Alternatively, "escape" a newline by representing it as (say) 0xFF, 0x00. You'd then also need to escape a genuine 0xFF as (say) 0xFF 0xFF. When you read the data, if you read an 0xFF you'd then read the next byte to determine whether it represented a newline or a genuine 0xFF.

EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of GZipStream is not text, and shouldn't be treated as if it were text using Encoding. However, you can turn it into ASCII text very easily, by calling Convert.ToBase64String. By the way, another trick you've missed is to call ToArray on the MemoryStream, which will give you the contents as a byte[] with no extra messing around.

If the data you posted is representative of all the data, then you have a newline every 4 bytes, so if you need it when converting back, just stick one in every 4 bytes of data

继续阅读：compression string

Run length encoding of hexadecimal strings including newlines

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？