开发者

Can Cron jobs be used to simultate multi threading with PHP?

I have a mysql database table filled with 1000+ records, lets say 5000 records. Each record has a processed boolean flag, default to false (0). What I'd like to do is have a PHP script run on cron every minute. Its code would be something like this:

<?php
process();

function process()
{
   $sql = "SELECT id FRO开发者_运维问答M items WHERE processed = '0' ORDER BY id ASC LIMIT 1";
   $result = $this->db->query($sql);

   if (! $result->has_rows())
     die;

   $id = $result->getSingle('id');
   processItem($id); //Will set processed to 1 after processing is done
   process();
}
?>

It should be pretty clear what the above code does, it gets the id for the next record which is un-processed, processes it, and then calls the process() function again which repeats this process until there are no more items to be processed, at which point the execution would stop.

By putting this script on Cron to run every minute, I hope to have multiple instances of this script all working at processing the items simultaneously, so rather than processing one item at a time, 5-10+ items could be getting processed simultaneously.

1) Is this going to work the way I'm planning it? Any suggestions for improvements / things to watch out for?

2) Should I have the script set a counter for the number of running instances, so whenever a cron job starts, it checks the counter, if 50 (?) instances are running it would exit without processing. That might keep the server from crashing by having too many running processes using up too much memory? Any thoughts?


I have a couple of things to say:

Firstly you are using recursion to process multiple rows. This could lead to problems if you recurse too deep. Instead use a simple loop.

Secondly, do you know if this code can benefit from being run multiple times? If the machine is CPU bound it might not benefit from another thread. I suggest you check manually how many threads work best. More threads does not always make things go faster and in some cases can actually slow everything down.

Finally, I would certainly put a limit on how many of these scripts can run concurrently. This can be achieved simply by ensuring each script runs for no longer than say 5 minutes. Or you can keep a count of active scripts and ensure it doesn't go over the maximum number you determined in my second suggestion.

Edit: I've added some more information about the problem recursion can cause: Each time you recursively call a function extra space is used up on the stack. This space stores any local variables as well as the address of the function (allowing it to restore the state when the called function exits). The stack only has a finite amount of space so eventually your program will crash with a stack overflow. Try running this simple program:

function a($i) { 
   print $i . "\n"; 
   a($i + 1);
}
a(0);

On my system it crashes PHP after 608739 iterations. This number could be a lot smaller in a more complex function. A simple loop does not have these overheads therefore it does not have this problem.


The recursion doesn't seem necessary at all, and like bramp said, could lead to problems. Why not just

$sql = "SELECT id FROM items WHERE processed = '0' ORDER BY id ASC LIMIT 1";

while ( ($result = $this->db->query($sql) && $result->has_rows() ) {
   processItem( $result->getSingle('id') );
}

However, I foresee larger problems here. If you are going to be running this script every minute, what mechanism do you have in place to stop execute of previously executed scripts that may still be running? You may end up processing the same ID more than once.

If you absolutely require a (pseudo) multi-threaded approach, I suggest the following:

  1. Grab a range of or all of the unprocessed ID's, not just one at a time.
  2. Using the curl_multi_ family of functions, pass subsets of the above results (groups of n id's) to another script to do the actual processing.

This method allows you to have greater control over the whole process, and prevents the unnecessary single-querying to get the unprocessed ID's.


I started a project to solve exactly the same problem. It can continually run a script, and run more instances in parallel if there is high demand. If there is nothing to do then it will wait a specified interval before running an instance of the script.

If you're interested then have a read through some use cases: www.4pmp.com/fatcontroller/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜