How to run a big update on a live database?
A web project I'm coding for uses a complex CSV to MySQL converter to create their database. What this means is that to update the db content with the latest changes from the CSVs, a converter is run which truncates the relevant tables (but leaves others that are filled through the website) and fills them again with the data from the CSVs.
Yes, this is not a great process but there are good reasons why this approach was chosen over a standard "working on the actual db" approach.
What I'm struggling with is working out the best way to run this update process without hurting the user experience too much. A few numbers to keep in mind:
1) This process has to be run regularly, in the range of every f开发者_开发知识库ew weeks / once a month
2) The db converter currently takes about an hour and will probably take up to 15 hours in the future, at least if the predictions about the database growth are right (yes, ouch!) 3) The sql dump of the full database is currently under 20MB (which allows easy import through phpmyadmin) but will break that barrier soon enough. I guess this shouldn't be a problem as I can just use SSH upload instead.Here are some of the alternatives I thought of, all using a separate database with global settings as well (these settings are checked for every read/write on the site). Alternative 2 seems to be the worst as it prevents read access for the whole time of the conversion, which can be quite long as I said. All of them block write access for about the same long time which is fine though, it does not prevent users from signing up or anything critical like that. I'm quite curious about the feasibility of the third alternative as it theoratically allows to shorted downtime of the read functionality as I don't have to upload a big dump.
Has anyone done something like this? I'd appreciate superior alternatives if they're out there or any feedback on how to improve these and whether to chose 1 or 3. Thanks in advance :)
Alternative 1
1) Set globalsettings_booleans_writeable to 0
2) Download current DB (SQL dump)
3) Import downloaded DB locally
4) Run converter (in update mode) on local database
5) Export local DB
6) Set globalsettings_booleans_readable to 0
7) Import exported DB online
8) Set globalsettings_booleans_readable to 1
9) Set globalsettings_booleans_writeable to 1
Alternative 2
1) Set globalsettings_booleans_writeable to 0
2) Set globalsettings_booleans_readable to 0
3) Run converter (in update mode) on live database
4) Set globalsettings_booleans_readable to 1
5) Set globalsettings_booleans_writeable to 1
Alternative 3
1) Set globalsettings_booleans_writeable to 0
2) Create a remote copy of the database
3) Run converter (in update mode) on remote copy
4) Set globalsettings_booleans_readable to 0
5) Replace remote original with remote copy (easy?)
6) Set globalsettings_booleans_readable to 1
7) Set globalsettings_booleans_writeable to 1
It seems to me that a lot of exclusivity could be avoided by examining the CSV to see what records actually would result in a change to the database. It seems like the CSV generator is the actual source of the data, and the database is simply a mirror of it, right?
If so, the CSV records which result in no change could be ignored, the d/b tables not truncated, and the remaining CSV records could be run using Alternative 2, which presumably would take only a few minutes.
The major weakness of this approach is if records are deleted at the source and there's no indication that the d/b needs to have them deleted locally.
精彩评论