开发者

Using a central database server for many sites: plausible?

Basically, I need some parts of database data synchronized on up to several dozens of sites. The perfect solution would be creating a central server to host that data.

Each pageload will have to fetch data from both database servers - the local and remote one and writes to the remote server will be quite common too.

While the db server can be as fast as desired hardware-wise, I'm cautious of the bottlenecks:

  • Multiple database connections must be established on each pageload.
  • Latency of the signal traveling between two physical locations.

Am I right to worry? Would it be wiser to synch the databases with cronjobs or other technologies?


Along with assigning a bounty, I'm adding to the question hoping an expert with开发者_开发问答 real life experience of this comes along:

What other technologies are there (besides cron) to sync MySql databases?


This questions is really down to your situation and I believe you've identified the main two problems with the central DB solution - so yes, you are right to be concerned.

I would personally opt to sync the data to the servers using a cron (or whatever method you chose) - cutting down on hardware costs and page load times. This to me is the more technical solution but in terms of its benefits (faster page loading times, no dependency on the central DB, lower costs) is the correct solution.

Alternatively, you could always set up a small MySQL database on a remote server and create a few test websites and run some benchmarks, this would give you some data on whether you are happy with the loading times.


MySQL replication is definitely the way to go. The problem with having a single database server is that if the load becomes too high all your sites will go down. You want to spread the load as much as possible because if a server goes down or becomes overloaded, it's the end all big issue.

Some things to keep in mind when dealing with replication

  • You want at least 2 (preferably 3 or more, 1 master and 2 slaves) database servers.
  • You never write to your slave servers. All write operations go to the master, which replication will sync the slaves soon after.
  • You always read from the slave servers (unless you need to guarantee you have the most up to date data). By separating read and write operations between servers, you can significantly improve performance.

Throw in a load balancing server and your database load woes go away!


any method that suggest offline sync is wasting the benefits of mysql replication

(given the unclear situation you have mentioned)
your solution can be as simple as keep READ/WRITE separately

that's mean on the local database,

  1. make sure local is read-enabled only to the database you wanted to sync from centralized database
  2. write operation is commit to centralized database (instead to local database)
  3. centralize database will than replicate the update to all local databases

problem

  1. replication delay due to network latency

benefits

  1. data integrity as write operation can only be done of centralize server and using replication to copy changes to various local database
  2. local database is possible to allow individual write operation (another set of data/database)
  3. read from local database is much faster than centralized database (consider read operation is more frequently than write operation)


The way Google solved this issue (you get some of the information here. I'm sorry I don't have the link to the actual published paper describing it) is more or less through a series of triggers.

There is one (and by one, I mean thousands) central data hub and a series of clones. Each time a write is needed, a lock is requested of the hub, the write is performed on the clone, which then forwards the change to the hub (thereby releasing the lock). The hub then pushes the data to all of the other clones.

This means that read access can stay almost instant (you have a localized clone per site instance). Local write access will be quick as well. Everything else can be handled asynchronously so that the two servers only need to communicate a lock request and a lock received message before the write starts, and the push can happen after the user has moved on.

This may be a tad much for your needs, but that is how Google does it.


First a warning, what you're trying to do is not easy; whilst MySQL supports master/ slave replication and you can have multiple masters and slaves running in all sorts of levels of tiers, what you really need to think about is "how do I recover from a database server failure" - do you promote a slave? what about consistency (as its guaranteed that the replication failed between the slaves)? etc. You also need to consider schema modifications; every thing is fine and dandy as long as you have the same schema on all servers, but as soon as you need to push a code update that requires a simultaneous database change you can't rely on that schema change having promulgated to the replications.

Okay, warning over, so how do you do it? Easiest way is to fire up the latest version of PhpMyAdmin which allows you to configure replication very quickly and easily. Before you do that, make sure you have binary-logging turned on in all the MySql servers as this will be your crash recovery savior; http://dev.mysql.com/doc/refman/5.0/en/binary-log.html

Where you site your servers is the next big question. If your users aren't geographically dispersed and your query loads are low-ish, you can probably host them all behind a private network in the same data warehouse. The master-slave replication will give you a large performance uplift in any case since all database reads should be made against slaves and only writes performed against the master.

If you need to geo-locate so they can't all be stored in the same data warehouse, then things get a little more difficult; you now have latency to contend with. In this situation, since the internet is not instantaneous, a write made to the master will take time to propagate to the slave. Therefore any select query made very shortly after the write probably won't return the new data, since it won't have been replicated to the slave yet. This is called "eventual consistency" and is relatively easy to overcome once you recongise its going to happen and code to expect it - i.e. never assume the data is present.

I can't possibly answer your question with any real justice on this site. Your best bet is to read a book, I highly recommend this one;

MySQL High Availability - ISBN-13: 978-0-596-80730-6


My quick answer to this would be to use a job queue system like Gearman to offload the sync work too. This way, it does not affect page load or user experience. You simply create a Gearman job, and it will send the job to the Gearman Queue and get to it as it can.

This also seems like a much better, instant, solution to using a cron. Because this would instantly add the job to the queue and have it handled almost instantly too. And since you seem to want to replicate only select data, I dont see how MySQL Replication would be of much assistance.

I have worked with Gearman before (even with PHP) and it was a great solution for breaking off work to somewhere else to be completed, when the page load did not need to wait for that job to complete.

Although this might not be simple as I made it seem, since you do need to setup and learn Gearman, but it is a very handy tool.

Hope this helps!


I was wondering whether or not you are using SQL Server as your back-end or something else. I am pretty sure with SQL you can use SQL Replication http://technet.microsoft.com/en-us/library/ms151198.aspx to achieve the desired goal. At that point your local apps would access their own SQL instance, while each sql instance would "replicate" and "sync" its data with the main DB server. The end result is that your central DB will always be up to date and have aggregated data from each and every satelite SQL server. (Though please don't quote me on this one... I am not a SQL expert.)

(Sorry, I just realized you are using PHP/MySQL... and probably favor open source... However, I think this is worth looking into.)


I have made some database synchronization between client-server php application and used following idea http://vitana-group.com/article/php/data-synchronization

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜