开发者

Patterns for caching related data

I'm currently developing the foundation of a an application, and looking for ways to optimize performance. My setup is based on the CakePHP framework, but I believe my question is relevant to any technology stack, as it relates to data caching.

Let's take a typical post-author relation, which is represented by 2 tables in my db. When I query the database for a specific blog post, at the same time the built-in ORM functionality in CakePHP also fetches the author of the post, comments on the post, etc. All of this is returned as one big-ass nested array, which I store in cache using a unique identifier for the concerned blog post.

When updating the blog post, it is child play to destroy the cache for the post, and have it regenerated with the next request.

But what happens when not the main entity (in this case the blog post) gets updated, but rather some of the related data? For example, a comment could be deleted, or the author could update his avatar. Are there any approaches (patterns) which I could consider for tracking updates to related data, and applying updates to my cache accordingly?

I'm curious to hear whether you've also run into similar challenges, and how you have managed to potentially overcome the hurdle. Feel free to pr开发者_JAVA百科ovide an abstract perspective, if you're using another stack on your end. Your views are anyhow much appreciated, many thanks!


It is rather simple, cache entries can be

  • added
  • destroyed

You should take care of destroying cache entries when related data change (so in application layer in addition to updating the data you should destroy certain types of cached entries when you update certain tables; you keep track of dependencies by hard-coding it).

If you'd like to be smart about it you could have your cache object state their dependencies and cache the last update times for your DB tables as well.

Then you could

  • fetch cached data, examine dependencies,
  • get update times for relevant DB tables and
  • in case the record is stale (update time of a table that your big ass cache entry depends on is later then the time of the cache entry) drop it and get fresh data from the database.

You could even integrate the above into your persistence layer.

EDIT:
Of course the above is for when you want to have consistent cache. Sometimes, and for some data, you can relax the consistency requirements and there are scenarios where simple TTL will be good enough (for a trivial example, if you have ttl of 1 sec, you should mostly be out of trouble with users and can help data processing; and with higher times you might still be ok - for example let's say you are caching the list of country ISO codes; your application might be perfectly ok if you say let's cache this for 86400 sec).

Furthermore, you could also track the times of information presented to user, for example

  • let's say user has seen data A from cache and that we know that this data was created/modified at time t1
  • user makes changes to the data A (and makes it data B) and commits the change
  • the application layer can then examine if the data A is still as in DB (if the cached data upon which the user made decisions and/or changes was indeed fresh)
  • if it was not fresh then there is a conflict and user should confirm the changes

This has a cost of extra read of data A from DB, but it occurs only on writes. Also, the conflict can occur not only because of the cache, but also because of multiple users trying to change the data (i.e. it is related to locking strategies).


One Approach for memcached is to use tags ( http://code.google.com/p/memcached-tag/ ). For Example, you have your Post "big-ass nested array" lets say, it inclused the autors information, the post itself and is shown on the frontpage and in some box in the sidebar. So it gets the tags: frontpage, {auhothor-id}, sidebar, {post-id} - now if someone changes the Author Information you flush every cache entry with the tag {author-id}. But thats only one Solution, and only for Cache Backends that support Tags, for example not APC (afaik). Hope That gave you an example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜