开发者

Importing 60,000 nodes

I'm using Table Wizard + Migrate module to import nodes into my Drupal installation.

I need to import around 60,000 questions / answers (they are both nodes) and I thought it would have been an easy task.

However, the migrate process imports 4 nodes per minute, and it would take approximately 11 days to finish the importing.

I was wondering if I can make it faster by importing directly in mysql. But I actually need to create 60,000 nodes. I gu开发者_StackOverflow中文版ess Drupal is going to store additional information in other tables... and it is not that safe.

what do you suggest me to do ? Wait 10 days ? Thanks


Table migrate should be orders of magnitude faster than that.

Are you using pathauto?

If yes, try disabling the pathauto module, often causes big performance problems on import.

Second, if disabling pathauto doesn't work, turn off all non-essential modules you may have running - some modules do crazy stuff. Eliminate other modules as the sources of the problem.

Third, is MySQL db log turned on? That can have a big performance impact - not the level you are talking about, but its something to consider.

Third, install xdebug, and tail your mysql log to see exactly whats happening.

What is your PHP memory limit?

Do you have plenty of disk space left?


If you're not doing it, you should use drush to migrate the nodes in batches. You could even write a shell script for it, if you want it automated. Using the command line should lower the time it takes to import the nodes a lot. With a script, you can make it an automated task that you don't have to worry about.

One thing I want to note though, 4 nodes per minute is very low. I once needed to import some nodes from a CSV file, using migrate etc. I needed to import 300 nodes, with location, 4-5 CCK fields and I did it in a matter of seconds. So if you only import 4 nodes per minute, you either have extremely complex nodes, or something fishy is going on.

What are the specs of the computer you are using for this? Where's the import source located?


This is a tough topic, but within Drupal actually very well covered. I don't know the ins- and outs. But do know where to look.

  • Data Mining Drupalgroup has some pointers, knowledge and information on processing large amounts of data in PHP/Drupal.
  • Drupal core has batch-functionality built in and called BatchAPI At your service when writing modules! For a working example, see this tutorial on CSV import.


4 node per minute is incredibly slow. Migrate shouldn't normally take that long. You could speed things up a bit by using Drush, but probably not enough to get a reasonable import time (hours, not days). That wouldn't really address your core problem: your import itself is taking too long. The overhead of the Migrate GUI isn't that big.

Importing directly into MySQL would certainly be faster, but there's a reason Migrate exists. Node database storage in Drupal is complicated, so it's generally best to let Drupal work it out rather than trying to figure out what goes where.

Are you using Migrate's hooks to do additional processing on each node? I'd suggest adding some logging to see what exactly is taking so long. Test it on 10 nodes at a time until you figure out the lag before doing the whole 60k.


We had a similar problem on a Drupal 7 install. Left it run all week-end on an import, and it only imported 1,000 lines of a file.

The funny thing is that exactly the same import on a pre-production machine was taking 90 minutes.

We ended up comparing the source code (making sure we are at the same commit in git), the database schema (identical), the quantity of node on each machine (not identical but similar)...

Long story made short, the only significant difference between the two machines was the max_execution_time option in the php.ini settings file.

The production machine had max_execution_time = 30, while the pre-production machine had max_execution_time = 3000. It looks like the migrate module has a kind of system to handle "short" max_execution_time that is less than optimal.

Conclusion : set max_execution_time = 3000 or more in your php.ini, that helps a lot the migrate module.


I just wanted to add a note saying the pathauto disable really does help. I had an import of over 22k rows and before disabling it took over 12 hours and would crash multiple times during the import. After disabling pathauto and then running the import, it took only 62 minutes and didn't crash once.

Just a heads up, I created a module that before the import starts, disables the pathauto module, and then upon the feed finishing, reenables the pathauto module. Here's the code from the module in case anyone needs to have this ability:

function YOURMODULENAME_feeds_before_import(FeedsSource $source) {
  $modules = array('pathauto');
  drupal_set_message(t('The ').$modules[0].t(' has been deployed and should begin to disable'), 'warning');
  module_disable($modules);
  drupal_set_message(t('The ').$modules[0].t(' module should have been disabled'), 'warning');
}

function YOURMODULENAME_feeds_after_import(FeedsSource $source) {
  $modules = array('pathauto');
  module_enable($modules);
  drupal_set_message($modules[0].t(' should be reenabled now'), 'warning');
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜