开发者

Splitting 20 million record database

I have a US company database which is 20 millions records. Firstly they is no budget for a massive R开发者_如何学CAM database server. So I think I am going to have to split the db into parts, 4 parts grouped by State.

My question is how is the best way to handle this with PHP, I am thinking get the users query find the State and then point to the relevant db? Any thoughts?


I think you need to look at the MySQL partitioning


sounds like you might want to consider sharding.

Not sure if you are using an ORM for data access, but some of them support sharding. Some info on sharding for php and mySQL here:

http://highscalability.com/database-sharding-netlog-mysql-and-php

just realised - link missing to the actual article in last url... try here: http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/


You don't need PHP for all these operations. Maybe to generate SQL code. It's better to make SQL scripts that copy data from the original tables into the new. See "INSERT ... SELECT ..." and "CREATE TABLE ... AS SELECT ..." if you are not familiar with them yet.

If you have MySQL >=5.1, then try partitioning table so that any request hits only 1 partition.

  • If users need information only on 1 state, partition it by state. There can be a lot of partitions without overhead work for you. If users can see only a certain time frame, like month graphs in Webalizer, partition by months, and so on.

Also consider creating aggregate tables. Let me elaborate: in data warehouses there is a distinction between metrics and attributes.

  • An attribute is a column that tells where, when, what, what kind of.
  • A metric tells how much, how many.

An aggregate table has less level of detail: either less attributes (no geographical info, or no product info), or some steps upper the attributes in the full table (just state instead of city+state, year-month instead of date, and so on).

And the last: make sure your users really need the detailed old data. Some of the data becomes irrelevant in a couple of years. For instance, website referrers have no meaning after 1,5-2 years, since most of the websites change. The 2-years-old website traffic data can be just a number of daily/monthly graphs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜