want to create a data warehouse... new database or just pile the tables into the existing database?
So I've got a MySQL database for an web community that is a potential stats goldmine. Currently I'm serving stats built via all sorts of nasty queries on my well-normalized database. I've run into the "patience limit" for such queries on my shared hosting, and would like to move to data warehousing and a daily cron job, thereby sacrificing instant updates for a 100-fold increase in statistical depth.
I've just started reading about data warehouses, and particularly the star schema, and it all seems pretty straight-forward.
My question essentially is - should I toss all that crap into a new database, or just pile the tables into my existing MySQL database? The current database has 47 tables, the largest of which has 30k records. I realize this is paltry compared to your average enterprise application, but your average enterprise application does not (I hope!) run on shared-hosting!
So, keeping my hardware limits in mind, which method would be bett开发者_StackOverflow中文版er?
I really don't know much about this at all, but I assume reading Table A, calculating, then updating Table B is a lot easier in the same database than across databases, correct?
Should I even care how many tables my DB has?
If you just need to improve performance, you should just create a set of pre-cocked reporting tables. Low effort and big performance gains. With the data volume you described, this won't even have an noticable impact on the users of your web community.
The different database approach has several benefits (see below) but I don't think you will gain any of them as you are on a shared database host.
- You can support different SLA for DW and web site
- DW and Web database can have different configurations
- DW database is basically read-only for large portion of the day
- DW and Web database can have different release cycles (this is big)
- Typical DW queries (large amount of data) don't kill the cache for web DB.
The number of tables in a particular database does not usually become a problem until you have thousands (or tens of thousands) of tables, and these problems usually come into play due to filesystem limits related to the maximum number of files in a directory.
You don't say what storage engine you are using. In general, you want the indexes in your database to fit into memory for good insert/update/delete performance, so the size of your key buffer or buffer pool must be large enough to hold the "hot" part of the index.
精彩评论