How to implement caching with WSGI?
I'd like to build a caching proxy as a Python WSGI middleware and wonder how this middleware could find out whether a cached page is expired. As far as I know WSGI doesn't support something like the getLastModified(HttpServletRequest req) method of Java Servlets.
What I'm not looking for is a per client caching strategie with "if modified since" or "etags". I want to cache content for all clients like a proxy server. So the cache have to check whethe开发者_JAVA百科r the WSGI app, or resource in terms of REST, was modified and thus expired in the cache.
client cache wsgi app
------ ----- --------
| get /some/x | |
|------------------>| /some/x expired? |
| |------------------->|
| | |
| | update /some/x |
| | if modified |
| return /some/x |<-------------------|
|<------------------|
Is it possible to implement it without by-passing WSGI?
Of course you can. First of all, only you know whether a resource is expired or not, the resource might from a file, an article from database, therefore, there won't be an universe "expired or not" method for you. Here is a simple example:
class WSGICache(object):
def __init__(self, app):
self.app = app
self.cache = {}
def is_expired(self, environ):
"""Determine is the resource the request for already expired?
"""
# FIXME: check is the resource expired, by looking
# PATH_INFO, if it is a file, it might be last modified time
# if it is an object from database, see what is the last modified time
return False
def __call__(self, environ, start_response):
path = environ['PATH_INFO']
cached = self.cache.get(path)
# do we have valid cache?
if self.is_expired(environ) or not cached:
cached = list(self.app(environ, start_response))
self.cache[path] = cached
return cached
But for production usage, I suggest use some already built caching system like Beaker, I think it should be good enough to do what you want. I didn't test the code above, but a middleware like this be able to do what you want.
When you say 'build' you mean configure or develop one yourself. The thing is that there are tons of HTTP cache tools out there. I would recommend you to look at:
- Optimising Web Delivery
- or mod_cache in Apache
with this tools you can configure timeouts to flush the caches. The problem I guess is how dynamic is your content. If your content is fairly static any of this tools should work for the case.
For WSGI here you have a configuration example with SQUID Cache
You can try shelve. https://docs.python.org/2/library/shelve.html
If you want to use it for caching webpages, you can store the webpage code in the shelve or cache, then return that to the client and have wsgi modify the page when required.
精彩评论