packing and unpacking variable length array/string using the struct module in python

2023-01-16 17:02 问答作者：

I am trying to get a grip around the packing and unpacking of binary data in Python 3. Its actually not that hard to understand, except one problem:

what if I have a variable length t开发者_C百科extstring and want to pack and unpack this in the most elegant manner?

As far as I can tell from the manual I can only unpack fixed size strings directly? In that case, are there any elegant way of getting around this limitation without padding lots and lots of unnecessary zeroes?

The struct module does only support fixed-length structures. For variable-length strings, your options are either:

Dynamically construct your format string (a str will have to be converted to a bytes before passing it to pack()):
```
s = bytes(s, 'utf-8')    # Or other appropriate encoding
struct.pack("I%ds" % (len(s),), len(s), s)
```
Skip struct and just use normal string methods to add the string to your pack()-ed output: struct.pack("I", len(s)) + s

For unpacking, you just have to unpack a bit at a time:

(i,), data = struct.unpack("I", data[:4]), data[4:]
s, data = data[:i], data[i:]

If you're doing a lot of this, you can always add a helper function which uses calcsize to do the string slicing:

def unpack_helper(fmt, data):
    size = struct.calcsize(fmt)
    return struct.unpack(fmt, data[:size]), data[size:]

I've googled up this question and a couple of solutions.

construct

An elaborate, flexible solution.

Instead of writing imperative code to parse a piece of data, you declaratively define a data structure that describes your data. As this data structure is not code, you can use it in one direction to parse data into Pythonic objects, and in the other direction, convert (“build”) objects into binary data.

The library provides both simple, atomic constructs (such as integers of various sizes), as well as composite ones which allow you form hierarchical structures of increasing complexity. Construct features bit and byte granularity, easy debugging and testing, an easy-to-extend subclass system, and lots of primitive constructs to make your work easier:

Updated: Python 3.x, construct 2.10.67; also they have native PascalString, so renamed


    from construct import *
    
    myPascalString = Struct(
        "length" / Int8ul,
        "data" / Bytes(lambda ctx: ctx.length)
    )

    >>> myPascalString.parse(b'\x05helloXXX')
    Container(length=5, data=b'hello')
    >>> myPascalString.build(Container(length=6, data=b"foobar"))
    b'\x06foobar'


    myPascalString2 = ExprAdapter(myPascalString,
        encoder=lambda obj, ctx: Container(length=len(obj), data=obj),
        decoder=lambda obj, ctx: obj.data
    )

    >>> myPascalString2.parse(b"\x05hello")
    b'hello'

    >>> myPascalString2.build(b"i'm a long string")
    b"\x11i'm a long string"

ed: Also pay attention to that ExprAdapter, once native PascalString won't be doing what you need from it, this is what you will be doing.

netstruct

A quick solution if you only need a struct extension for variable length byte sequences. Nesting a variable-length structure can be achieved by packing the first pack results.

NetStruct supports a new formatting character, the dollar sign ($). The dollar sign represents a variable-length string, encoded with its length preceeding the string itself.

edit: Looks like the length of a variable-length string uses the same data type as the elements. Thus, the maximum length of variable-length string of bytes is 255, if words - 65535, and so on.

import netstruct
>>> netstruct.pack(b"b$", b"Hello World!")
b'\x0cHello World!'

>>> netstruct.unpack(b"b$", b"\x0cHello World!")
[b'Hello World!']

An easy way that I was able to do a variable length when packing a string is:

pack('{}s'.format(len(string)), string)

when unpacking it is kind of the same way

unpack('{}s'.format(len(data)), data)

Here's some wrapper functions I wrote which help, they seem to work.

Here's the unpacking helper:

def unpack_from(fmt, data, offset = 0):
    (byte_order, fmt, args) = (fmt[0], fmt[1:], ()) if fmt and fmt[0] in ('@', '=', '<', '>', '!') else ('@', fmt, ())
    fmt = filter(None, re.sub("p", "\tp\t",  fmt).split('\t'))
    for sub_fmt in fmt:
        if sub_fmt == 'p':
            (str_len,) = struct.unpack_from('B', data, offset)
            sub_fmt = str(str_len + 1) + 'p'
            sub_size = str_len + 1
        else:
            sub_fmt = byte_order + sub_fmt
            sub_size = struct.calcsize(sub_fmt)
        args += struct.unpack_from(sub_fmt, data, offset)
        offset += sub_size
    return args

Here's the packing helper:

def pack(fmt, *args):
    (byte_order, fmt, data) = (fmt[0], fmt[1:], '') if fmt and fmt[0] in ('@', '=', '<', '>', '!') else ('@', fmt, '')
    fmt = filter(None, re.sub("p", "\tp\t",  fmt).split('\t'))
    for sub_fmt in fmt:
        if sub_fmt == 'p':
            (sub_args, args) = ((args[0],), args[1:]) if len(args) > 1 else ((args[0],), [])
            sub_fmt = str(len(sub_args[0]) + 1) + 'p'
        else:
            (sub_args, args) = (args[:len(sub_fmt)], args[len(sub_fmt):])
            sub_fmt = byte_order + sub_fmt
        data += struct.pack(sub_fmt, *sub_args)
    return data

To pack use

packed=bytes('sample string','utf-8')

To unpack use

string=str(packed)[2:][:-1]

This works only on utf-8 string and quite simple workaround.

Nice, but can't handle numeric number of fields, such as '6B' for 'BBBBBB'. The solution would be to expand format string in both functions before use. I came up with this:

def pack(fmt, *args):
  fmt = re.sub('(\d+)([^\ds])', lambda x: x.group(2) * int(x.group(1)), fmt)
  ...

And same for unpack. Maybe not most elegant, but it works :)

Another silly but very simple approach: (PS:as others mentioned there is no pure pack/unpack support for that, with that in mind)

import struct


def pack_variable_length_string(s: str) -> bytes:
    str_size_bytes = struct.pack('!Q', len(s))
    str_bytes = s.encode('UTF-8')
    return str_size_bytes + str_bytes


def unpack_variable_length_string(sb: bytes, offset=0) -> (str, int):
    str_size_bytes = struct.unpack('!Q', sb[offset:offset + 8])[0]
    return sb[offset + 8:offset + 8 + str_size_bytes].decode('UTF-8'), 8 + str_size_bytes + offset


if __name__ == '__main__':
    b = pack_variable_length_string('Worked maybe?') + \
        pack_variable_length_string('It seems it did?') + \
        pack_variable_length_string('Are you sure?') + \
        pack_variable_length_string('Surely.')
    next_offset = 0
    for i in range(4):
        s, next_offset = unpack_variable_length_string(b, next_offset)
        print(s)

继续阅读：binary python python-3.x

packing and unpacking variable length array/string using the struct module in python

construct

netstruct

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

性激素六项检查的最佳时间是多久？多少钱？？

抽烟只抽炫赫门？

Infinite gtk warnings when I right click on the icon

construct

netstruct

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

性激素六项检查的最佳时间是多久？多少钱？？

抽烟只抽炫赫门？

Infinite gtk warnings when I right click on the icon

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？