开发者

Google protocol buffers huge in python

I started using the protocol buffer library, but noticed that it was using huge amounts of memory. pympler.asizeof shows that a single one of my objects is about 76k! Basically, it contains a few strings, some numbers, and some enums, and some optional lists of same. If I were writing the same thing as a C-struct, I would expect it to be under a few hundred bytes, and indeed the ByteSize method returns 121 (the size of the serialized string).

Is that you expect from the library? I had heard it was slow, but this is unusable and makes me more inclined to believe I'm misusing it.

Edit

Here is an example I constructed. This is a pb file similar, but simpler than what I've been using

    package pb;

message A {
    required double a       = 1;
}

message B {
    required double b       = 1;
}

message C {
    required double c       = 1;
    optional string s       = 2;
}

message D {
    required string d       = 1;
    optional string e       = 2;
    required A a            = 3;
    optional B b            = 4;
    repeated C c            = 5;
}

And here I am using it

>>> import pb_pb2
>>> a = pb_pb2.D()
>>> a.d = "a"
>>> a.e = "e"
>>> a.a.a = 1
>>> a.b.b = 2
>>> c = a.c.add()
>>> c.c = 5
>>> c.s = "s"
>>> import pympler.开发者_StackOverflowasizeof
>>> pympler.asizeof.asizeof(a)
21440
>>> a.ByteSize()
42

I have version 2.2.0 of protobuf (a bit old at this point), and python 2.6.4.


Object instances have a bigger memory footprint in python than in compiled languages. For example, the following code, which creates very simple classes mimicking your proto displays 1440:

class A:
  def __init__(self):
    self.a = 0.0

class B:
  def __init__(self):
    self.b = 0.0

class C:
  def __init__(self):
    self.c = 0.0
    self.s = ""

class D:
  def __init__(self):
    self.d = ""
    self.e = ""
    self.e_isset = 1
    self.a = A()
    self.b = B()
    self.b_isset = 1
    self.c = [C()]

d = D()
print asizeof(d)

I am not surprised that protobuf's generated classes take 20 times more memory, as they add a lot of boiler plate.

The C++ version surely doesn't suffer from this.


Edit: This isn't likely your actual issue here, but we've just been experiencing a 45MB protobuf message taking > 4GB ram when decoding. It appears to be this: https://github.com/google/protobuf/issues/156

which was known about in protobuf 2.6 and a fix was only merged onto master march 7 this year: https://github.com/google/protobuf/commit/f6d8c833845b90f61b95234cd090ec6e70058d06

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜