开发者

Migrating large forum data from one system to another

I need to migrate a fairly large forum installation from one system (custom, MS SQLServer) to another (vbulletin, MySQL). The target system has a number of different import scripts that I plan on looking at to get ideas from, but I'm hoping that I can get some recommendations here on some of the aspects of it.

Worth noting are:

  • The original system uses Microsoft SQLServer. The new system uses MySQL (the schema开发者_JAVA技巧s are not similar)
    • Can PHP talk to SQLServer? If not, perhaps use Java to do the migration since it can talk to both? (the new system is written in php, and we'd like to keep all code in one language if possible)
  • The amount of data is on the order of 13 million posts and 650k members
    • If necessary, we can prune members (removing those that haven't logged in in the last X years and don't have posts in active threads)
    • If necessary, we can prune threads (removing those that haven't had new posts in the last X years)
    • Even after pruning, we're still likely to have on the order of 7.5 million posts

The things things that I think might cause problems include:

  • I'm not sure if I can keep ids (message or user) from the old system in the new.
  • I obviously can't load all 13m records into memory from the old database, process them, and then write to the new database
  • I want to be able to run a second data migration later to get any new data since the original import (so we can run a long running import against a backup of the main database, then run an "everything else" import when it's time to switch off the old system)

My current thought is to do something like:

  • Add an old_user_id column to the new database user table
  • Migrate the users from the old database, putting their original user id in the new column
  • Load threads from the old database in batches of X, and insert them into the new database, using the old_user_id -> new_user_id mapping in the user table
  • When migrating users, load them in order of creation date and keep track of the most recent date (in another db table). This will allow the system to pick up where it left off the next time we run it
  • Same thing applies to threads
  • When running a migration, first get for everything "created before the stored date, but modified after it" and update the records in the database with the modified information. Once that is done, handle the things created since then

My apologies for such an open ended question. There's a lot of factors involved, and it really isn't something I have specific questions on yet. I'm really just looking for any thoughts/suggestions from folks that may have had to handle something similar in the past. Any ideas on the best way to handle things, things I'm missing, or edge cases I should pay attention to.

Edit: I can't figure out how to make this a wiki. If someone can convert it for me or tell me how, I'd be happy to do so. It obviously doesn't have a single, correct answer, so it probably should be marked as such.


Can PHP talk to SQLServer? Yes link here


The first thing you'd have to do is compare the data structure of your database versus the CMS you will be using, then you can determine which fields can be changed, altered and such.

I assume you would dump the entire sql, mostly, it would take you a search and replace for changing datatypes or table names for better compatibility when importing.

Also, you might want to check out http://php.net/manual/en/book.mssql.php regarding your php vs mssql question. That could save you a lot of trouble.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜