开发者

Fastest database engine for caching?

I use MySQL for my primary database, where I keep the actual objects. When an object is rendered using a template, rendering takes a lot of time.

Because of that I've decided to cache the produced HTML. Right now I store the cache in files, named appropriate, and it does work significantly faster. I am however aware that it is not the best way to do so.

I need a (preferably key-value) database to store my cache in. I cannot use a caching proxy because I still need to process the cached HTML. Is there such a 开发者_运维问答database with a PHP front end?

Edit: If I use memcached, and I cache about a million pages, won't I run out of RAM?

Edit 2: And again, I have a lot of HTML to cache (gigabytes of it).


If I use memcached, and I cache about a million pages, won't I run out of RAM?

Memcached

memcached is also a real solid product(like redis more) used at all big sites to keep them up and running. Almost al active tweets(which user fetch) are stored in memcached for insane performance.

If you want to be fast you should have your active dataset in memory. But yeah if the dataset is bigger then your available memory you should(should always store data in persistent datastore because memcached is volatile) store data in a persistent datastore like for example mysql. When it's not available in memory you will try and fetch it from datastore and cache it memcache for future reference(with expire header).

Redis

I really like redis because it is an advanced key-value store with insane performance

Redis is an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, and so forth. Redis supports different kind of sorting abilities.

Redis has a VM so you don't need a seperate persisent datastore. I really like redis because of all the available commands (power :)?). This tutorial by simon willison displays(a lot of) the raw power which redis has.

Speed

Redis is pretty fast!, 110000 SETs/second, 81000 GETs/second in an entry level Linux box. Check the benchmarks.

Commits

Redis is more actively developed. 8 hours ago antirez(redis) commited something versus memcached 12 November latest commit.

Install Redis

Redis is insanely easy to install. It has no dependencies. You only have to perform:

make
./redis-server redis.conf #start redis

to compile redis(Awesome :)?).

Install Memcached

Memcached has dependency(libevent) which makes it more difficult to install.

wget http://memcached.org/latest
tar -zxvf memcached-1.x.x.tar.gz
cd memcached-1.x.x
./configure
make && make test
sudo make install

not totally true because memcached has libevent dependency and ./configure will fail of libevent is missing. But then again they have packages which are cool, but require root to install.


Redis is pretty fast: 110,000 SETs/second

If speed is a concern, why use the network layer?

According to: http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf

  • InnoDB inserts ....................43,000 records per second AT ITS PEAK*;
  • TokuDB inserts ....................34,000 records per second AT ITS PEAK*;
  • G-WAN KV inserts ....100,000,000 records per second

(*) after a few thousands of inserts, performances degrade severely for InnoDB and TokuDB which end to write to disk when their cache and the system cache and the disk controller cache are full. See the PDF for an interesting discussion of the problems caused by the topology of the InnoDB database index (which severely breaks locality while the Fractals topology scales much better... but still not linearly).


To clarify the answers into logical views:

  • Flat Files are as fast the storage medium being used (DISK or RAM)
  • An environment which caches in RAM the MRU (Most Recently Used) items
  • Solution has a smart/fast hash index to all locations (what SQL systems rely on)

That combination will get you the best solution that you are looking for.

For argument sake, flat file or not - excluding a MEMORY ONLY solution - all engines use some form of flat file. The magic is knowing where your data is, and tuning reads to pull the data back most optimal. In the 80's at IBM we used a fixed record length flat file design - which wasn't optimized for disk space, it was optimized for I/O. Indexes then were based on Record Length * ROWID.

Now to your need, your ultimate performance for scale is to introduce a smart combination - we host over 1 million companies, with over 10 pages per company - 10 million files, plus js, css and images.

Theory 1) - You know your limitation is RAM - spool dynamic content to disk when feasible and drop such features as hit counters. Leverage NGINX or HIGHLY tune APACHE (or as we did, wrote our own web servers since 2001) - the whole concept is leverage RAM for the MOST USED, and have a very intelligent lookup for disk based content - normally the URI is fine.

Theory 2) - Trend Analysis and User Anticipation - I have spent years researching and developing systems that track trends. If I know a user will go path A, B, C, D - then when he hits B, I have already prefetched C and D. If I know a user will go A, B but may go E then D. You have the choice to pre-cache C and E, or for RAM sake prefetch D. and manually fetch C or E when the user picks that.

The Web Server we have developed along with some accounting systems I have developed over the years integrate Theory 2 to prefetch, with combinations of Smart Caching. We also store the content to disk in deflate - so the transport layer simply pumps the content onto the stack as 99% of the browsers support deflated streams. (It's faster to reflate before sending for that 1% than deflate 99% of the time)

Per the thought of MEMCACHED and SWAP - Disk speed is your enemy, however, tying up the kernel to manage that enemy is an epic fail! If you want to beat MEMCACHED performance, learn how to setup a RAM DISK and keep your deflated HOT requested items there!

** DISCLAIMER: This all assumes that you have enough bandwidth that your Infrastructure/Users bandwidth is not your bottleneck, but your servers are. @3FINC


http://memcached.org/ + http://php.net/manual/en/book.memcache.php


Flat files are "technically" the fastest - but if you're looking for something with a PHP front end and just screams - take a look at postgres.

http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#Raw_Speed

For memory caching look at memcached

http://memcached.org/

*Edit: from your edit ... (redundant yes) ... if you cache that volume in memory you will have issues. Look into postgres columnar table queries or a quasi-custom flat file solution.


As far as I know, using the file system is actually the fastest way to cache rendered templates without resorting to storing them in memory. Any database would simply add overhead and would make the whole thing slower by comparison.


I would use memcached or APC. Depending if you need caching shared between servers. Memcached is a daemon you connect to, where APC is actually inside of PHP instance (a little faster). Both of them store the cache in memory so it's blazing fast.


In fact storing cache in files is really the fastest way to do this. But, if you're really interested in putting them into a database, you can check out MongoDB. MongoDB is a document-oriented database so there are no server-side joins, that's why it's faster than mysql (1. with php 2. there are a lot of benchmarks on the internet).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜