Python: preventing caching from slowing me down

2023-04-07 14:31 问答作者：

I'm working on a web app with very aggressive caching. Virtually every component of the web app: views, partial views, controller output, disk loads, REST-API calls, Database queries. Everything that can be cached, at any level, is cached, all using decorators.

Naturally, this is blazing fast, since the vast majority of the HTML-generation comprises pure functions, with very few loads from disk/REST APIs. Furthermore, what few disk loads/database queries/REST API queries i perform are also cached until invalidated, so unless something just changed, they are really fast too.

So everything is blazing fast, but there is a hitch: all this stuff is being cached in memory, in one huge global dictionary in my WSGI process, and hence can be stored directly without serialization. Once i start putting stuff in memcached, the time taken for cache hits doesn't change too much, but putting stuff in cache starts taking much longer. In general that's ok, but the initial "fill cache" generation of each page goes from ~900m开发者_JAVA百科s (which is already pretty fast considering how many flat files it reads from disk) to about ~9000ms. For reference, generating an arbitrary page takes something like 10ms once the cache is warmed up.

Profiling the code, the vast majority of the time is going to cPickle. So the question is, how can I make this faster? Are there any in-memory caches which I can directly pass my objects to without serialization? Or some way to make caching my huge pile of objects faster? I could just go without a persistent memcached, but then my performance (or lack thereof) will be at the whim of the Apache/WSGI process manager.

If you are serializing Python objects and not simple datatypes, and have to use pickle, try cPickle.HIGHEST_PROTOCOL:

my_serialized_object = cPickle.dumps(my_object, cPickle.HIGHEST_PROTOCOL)

The default protocol is compatible with older versions of Python, but you likely don't care about that.

I just did a simple benchmark with a 1000 key dict and it was almost an order-of-magnitude faster.

UPDATE: Since you appear to already be using the highest protocol, you are going to have to do some extra work to get more performance. Here is what I would do at this point:

Identify which classes are the slowest to pickle
Create a pair of methods in the class to implement a faster serialization method, say _to_string() and _from_string(s). The actual serialization can be tailored to what the object encompasses and how it's going to be used. For example, some objects may really only contain a simple string, such as a rendered template, and some may actually be sent to a browser as JSON, in which case you can simply serialize to JSON and serve it directly. Use the timeit module to ensure that your method is actually faster
In your decorator, check hasattr(object, '_to_string') and use that instead, if it exists

This method lets you tackle the worst classes first, and introduces minimal disruption to the code base.

继续阅读：performance python serialization

Python: preventing caching from slowing me down

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？