开发者

GAE Lookup Table Incompatible with Transactions?

My Python High Replication Datastore application requires a large lookup table of between 100,000 and 1,000,000 entries. I need to be able to supply a code to some method that will return the value associated with that code (or None if there is no association). For example, if my table held acceptable English words then I would want the function to return T开发者_Go百科rue if the word was found and False (or None) otherwise.

My current implementation is to create one parentless entity for each table entry, and for that entity to contain any associated data. I set the datastore key for that entity to be the same as my lookup code. (I put all the entities into their own namespace to prevent any key conflicts, but that's not essential for this question.) Then I simply call get_by_key_name() on the code and I get the associated data.

The problem is that I can't access these entities during a transaction because I'd be trying to span entity groups. So going back to my example, let's say I wanted to spell-check all the words used in a chat session. I could access all the messages in the chat because I'd give them a common ancestor, but I couldn't access my word table because the entries there are parentless. It is imperative that I be able to reference the table during transactions.

Note that my lookup table is fixed, or changes very rarely. Again this matches the spell-check example.

One solution might be to load all the words in a chat session during one transaction, then spell-check them (saving the results), then start a second transaction that would spell-check against the saved results. But not only would this be inefficient, the chat session might have been added to between the transactions. This seems like a clumsy solution.

Ideally I'd like to tell GAE that the lookup table is immutable, and that because of this I should be able to query against it without its complaining about spanning entity groups in a transaction. I don't see any way to do this, however.

Storing the table entries in the memcache is tempting, but that too has problems. It's a large amount of data, but more troublesome is that if GAE boots out a memcache entry I wouldn't be able to reload it during the transaction.

Does anyone know of a suitable implementation for large global lookup tables?

Please understand that I'm not looking for a spell-check web service or anything like that. I'm using word lookup as an example only to make this question clear, and I'm hoping for a general solution for any sort of large lookup tables.


If you can, try and fit the data into instance memory. If it won't fit in instance memory, you have a few options available to you.

You can store the data in a resource file that you upload with the app, if it only changes infrequently, and access it off disk. This assumes you can build a data structure that permits easy disk lookups - effectively, you're implementing your own read-only disk based table.

Likewise, if it's too big to fit as a static resource, you could take the same approach as above, but store the data in blobstore.

If your data absolutely must be in the datastore, you may need to emulate your own read-modify-write transactions. Add a 'revision' property to your records. To modify it, fetch the record (outside a transaction), perform the required changes, then inside a transaction, fetch it again to check the revision value. If it hasn't changed, increment the revision on your own record and store it to the datastore.

Note that the underlying RPC layer does theoretically support multiple independent transactions (and non-transactional operations), but the APIs don't currently expose any way to access this from within a transaction, short of horrible (and I mean really horrible) hacks, unfortunately.

One final option: You could run a backend provisioned with more memory, exposing a 'SpellCheckService', and make URLFetch calls to it from your frontends. Remember, in-memory is always going to be much, much faster than any disk-based option.


First, if you're under the belief that a namespace is going to help avoid key collisions, it's time to take a step back. A key consists of an entity kind, a namespace, a name or id, and any parents that the entity might have. It's perfectly valid for two different entity kinds to have the same name or id. So if you have, say, a LookupThingy that you're matching against, and have created each member by specifying a unique name, the key isn't going to collide with anything else.

As for the challenge of doing the equivalent of a spell-check against an unparented lookup table within a transaction, is it possible to keep the lookup table in code?

Or can you think of an analogy that's closer to what you need? One that motivates the need to do the lookup within a transaction?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜