Python: preventing caching from slowing me down
I'm working on a web app with very aggressive caching. Virtually every component of the web app: views, partial views, controller output, disk loads, REST-API calls, Database queries. Everything that can be cached, at any level, is cached, all using decorators.
Naturally, this is blazing fast, since the vast majority of the HTML-generation comprises pure functions, with very few loads from disk/REST APIs. Furthermore, what few disk loads/database queries/REST API queries i perform are also cached until invalidated, so unless something just changed, they are really fast too.
So everything is blazing fast, but there is a hitch: all this stuff is being cached in memory, in one huge global dictionary in my WSGI process, and hence can be stored directly without serialization. Once i start putting stuff in memcached, the time taken for cache hits doesn't change too much, but putting stuff in cache starts taking much longer. In general that's ok, but the initial "fill cache" generation of each page goes from ~900m开发者_JAVA百科s (which is already pretty fast considering how many flat files it reads from disk) to about ~9000ms. For reference, generating an arbitrary page takes something like 10ms once the cache is warmed up.
Profiling the code, the vast majority of the time is going to cPickle. So the question is, how can I make this faster? Are there any in-memory caches which I can directly pass my objects to without serialization? Or some way to make caching my huge pile of objects faster? I could just go without a persistent memcached, but then my performance (or lack thereof) will be at the whim of the Apache/WSGI process manager.
If you are serializing Python objects and not simple datatypes, and have to use pickle, try cPickle.HIGHEST_PROTOCOL:
my_serialized_object = cPickle.dumps(my_object, cPickle.HIGHEST_PROTOCOL)
The default protocol is compatible with older versions of Python, but you likely don't care about that.
I just did a simple benchmark with a 1000 key dict and it was almost an order-of-magnitude faster.
UPDATE: Since you appear to already be using the highest protocol, you are going to have to do some extra work to get more performance. Here is what I would do at this point:
Identify which classes are the slowest to pickle
Create a pair of methods in the class to implement a faster serialization method, say _to_string() and _from_string(s). The actual serialization can be tailored to what the object encompasses and how it's going to be used. For example, some objects may really only contain a simple string, such as a rendered template, and some may actually be sent to a browser as JSON, in which case you can simply serialize to JSON and serve it directly. Use the timeit module to ensure that your method is actually faster
In your decorator, check hasattr(object, '_to_string') and use that instead, if it exists
This method lets you tackle the worst classes first, and introduces minimal disruption to the code base.
精彩评论