Is there a serialization overhead in unindexed lists stored in app engine datastore?
My understanding is indexed lists on app engine are stored by duplicating all data for each value of the list. So multiple lists create cartesian 开发者_运维知识库product type explosions, and getting an entity requires gathering all of these "rows" (up to 5000) into 1 entity which google calls serialization overhead, and is best avoided. Is my understanding correct?
If this is how app engine works I am wondering if an unindexed list is stored in the same way (requires the same data duplication, and the same serialization overhead) or if they are stored in binary or something since you never need to know what they are for querying/retrieving.
So I guess what I'm asking is: Do unindexed lists have serialization overhead and a 5000 row limit? If so how can avoid this.
Thanks
You're confusing two different topics here. Indexed and unindexed lists alike are stored as part of a serialized entity protocol buffer, with each element of the list stored separately. There's space overhead here, since the name of the property is stored with each element - foo = [1,2,3]
gets stored as [(foo, 1), (foo, 2), (foo, 3)], in effect.
Indexed lists are automatically added to the built-in indexes, with each list element requiring an index row. If you have two lists with 5 elements each, 10 index rows will be required in the built-in indexes.
If you use custom indexes to define an index on multiple list properties, or the same list property multiple times, every unique combination of items will be indexed. So an entity with two lists a=[1,2,3]
and b=[4,5,6]
and an index on a and b will generate index entries [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
, and an entity with one list c=[7,8,9,10]
and an index on c twice will generate index entries [(7, 8), (7, 9), (7, 10), (8, 9), (8, 10), (9, 10)]
. These are what's referred to as exploding indexes, and only occur in custom indexes that specify at least two instances of list properties in a given index.
Unindexed list properties still take the same amount of space in the entity protocol buffer and still require the same time to serialize and deserialize that PB, but don't have any of the index overhead.
精彩评论