raw decoder for protobufs format

2023-04-03 22:46 问答作者：

I'd like to find a way to convert a binary protobuf message into a human readable description of the contained data, without using the .proto files.

The background is that I have a .proto message that it being rejected by the parser on Android, but it's not entirely clear why. I could go through the message by hand, but it's rather tedious.

I tried protoc --decode_raw, but it just gives the error "Failed to parse input.". I google hoping/expecting someone would have done a nice web utility that might do this, but h开发者_运维百科aven't found anything obvious.

I'm just hoping to get some output like:

field 1: varint: 128
field 4: string: "foo"

Any pointers in the right direction would be most welcome!

For Posterity: Google's protocol buffer tools have the ability to decode raw buffers.

Just send the unknown buffer to it and pass the--decode_raw flag

$ protoc --decode_raw < has_no_proto.buff
2 {
  2: "Error retrieving information from server. [RH-02]"
}

So here's a message with field 2 set to an embedded message which in turn has its second field set to a string telling me I pissed off Google Play.

Type information isn't definite (it looks like it will try to display all binary data as strings -- but your requirement for varint/string/submessage distinction is met).

As noted in Michel de Ruiter's answer, it's possible that your protobuf message has a length-prefix. Assuming it does, this answer should help.

(NOTE: For most of the commands below, I'm assuming your protobuf message is stored in a file called input.)

`protoc --decode_raw` + `dd` for a single message:

If it's simply a single message, then you can indeed leverage protoc --decode_raw, but you need to strip off the length-prefix header first. Assuming the header is 4 bytes long you can use dd to strip the header off of input and then feed the output into protoc.

dd bs=1 skip=4 if=input 2>/dev/null | protoc --decode_raw

`protoc-decode-lenprefix --decode_raw` for a single message:

I also wrote a script that handles the header stripping automatically:

protoc-decode-lenprefix --decode_raw < input

This script is simply a wrapper on top of protoc --decode_raw, but handles parsing out the length-prefix header and then invoking protoc.

Now, this script isn't terribly useful in this case, because we can just use the dd trick above to strip the header off. However, say we have a data stream (e.g., a file or TCP stream) containing multiple messages that are framed with length-prefix headers....

`protoc-decode-lenprefix --decode_raw` for a stream of messages:

Instead of a single protobuf message in the input file, let's say input contained multiple protobuf messages which are framed by length-prefix headers. In this case it's not possible to just use the dd trick, because you need to actually read the contents of the length-prefix header to determine how long the subsequent message in the stream is, and thus how many bytes ahead the next header+message lies. So instead of worrying about all of that, you can simply use protoc-decode-lenprefix again:

protoc-decode-lenprefix --decode_raw < input

`protoc-decode-lenprefix --decode ... foo.proto` for a stream of messages

This script can also be used to fully decode length-prefixed messages (instead of just "raw decode" them). It assumes you have access to the .proto files that define the protobuf message, just like the wrapped protoc command. The invocation syntax is identical to protoc --decode. For example, using the dd trick with protoc --decode, along with input being a Mesos task.info file, the syntax looks like this:

dd bs=1 skip=4 if=task.info 2>/dev/null | \
protoc --decode mesos.internal.Task \
                      -I MESOS_CODE/src -I MESOS_CODE/include \
                      MESOS_CODE/src/messages/messages.proto

And the parameters are identical when using protoc-decode-lenprefix

cat task.info | \
protoc-decode-lenprefix --decode mesos.internal.Task \
                      -I MESOS_CODE/src -I MESOS_CODE/include \
                      MESOS_CODE/src/messages/messages.proto

If you happen to have a binary file containing (multiple?) length-prefixed protobuf messages, protoc ‒‒decode_raw < file cannot parse it because of the length prefixes. A simple way around that is to split the file into its consecutive messages and then convert each with protoc.

My take:

var fs = File.OpenRead(filename));
var buffer = new byte[4096];
int size;
for (int part = 1; Serializer.TryReadLengthPrefix(fs, PrefixStyle.Base128, out size); part++) {
  long startPosition = fs.Position;
  using (var writer = File.OpenWrite(string.Format("{0}[{1}].pb", filename, part))) {
    for (int bytesToRead = size; bytesToRead > 0; ) {
      int bytesRead = fs.Read(buffer, 0, Math.Min(bytesToRead, buffer.Length));
      bytesToRead -= bytesRead;
      if (bytesRead <= 0) // End of file.
        break;
      writer.Write(buffer, 0, bytesRead);
    }
  }
}

You could try forcing it through the wireshark plugin, or you could probably borrow the "reader" part of some of the implementations (I know how I would do this in C#, for example, but I doubt that is what you meant).

However, be cautious - string in protocol buffers doesn't really mean "string" - it could be:

a UTF-8 string
a raw BLOB of arbitrary data
a sub-message
a "packed" array
(probably something else I'm forgetting)

Maybe try https://pb-decode.online. I'm the author, suggestions are welcome :)

raw decoder for protobufs format

继续阅读：protocol-buffers

raw decoder for protobufs format

`protoc --decode_raw` + `dd` for a single message:

`protoc-decode-lenprefix --decode_raw` for a single message:

`protoc-decode-lenprefix --decode_raw` for a stream of messages:

`protoc-decode-lenprefix --decode ... foo.proto` for a stream of messages

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

protoc --decode_raw + dd for a single message:

protoc-decode-lenprefix --decode_raw for a single message:

protoc-decode-lenprefix --decode_raw for a stream of messages:

protoc-decode-lenprefix --decode ... foo.proto for a stream of messages

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

`protoc --decode_raw` + `dd` for a single message:

`protoc-decode-lenprefix --decode_raw` for a single message:

`protoc-decode-lenprefix --decode_raw` for a stream of messages:

`protoc-decode-lenprefix --decode ... foo.proto` for a stream of messages

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？