Optimal language for asynchronous processing of information
Before delving into the heart of the matter, first I will have to outline the current scenario. I currently have a php script that executes through CLI to process some data. It goes something like this:
- The user submits some data through the website and it is stored in a database
- A php script executing through CLI cycles through all of the data in the database every 5 minutes or so. It reads the information submitted by the user in the database, processes it, then creates multiple other entires in other databases. Often it might have to post something through http using file_get_contents.
- I cannot always have the information processed simply when the user submits it for logistical reasons (this is non-negotiable)
The code for it would look something like this:
$q = mysql_query("SELECT username, infoA, infoB FROM data");
while($r = mysql_fetch_array($q))
{
some_function($r['username'], $r['infoA']);
another_function($r['infoB']);
}
The functions "some_function" and "another_function" are where all the actual processing of the information occurs. Here is the issue: Often, there are a lot of entries to cycle through and there is far too large of a delay between the time the first entry is processed and the last one. I need all of the data processed with minimal delay between the first and last entry. The functions themselves are optimized well and run pretty fast so that is not the issue. Since future function calls do not need to reference data from previous function calls, I am thin开发者_开发百科king that I need the functions to be executed asynchronously. This way, the script can cycle to the next entry without waiting for the first entry to be done processing.
The php cli script I created is primarily for testing purposes. It works well for preliminary testing, but once I launch the quantity of data will be significantly greater. What is the ideal language for handling a task such as this. I certainly need the functions to be executed asynchronously. However, if there are too many asynchronous calls at the same time, it might overload the system or the information not be processed properly. Hence, there must also be an efficient way to to handle this. Can I still do this in php, or should I move to something else and why?
The requirements are that I can make http requests with GET data (I do not need to wait for the results), be able to use mysql, and memcached.
Realistically speaking, I will hire programmers to work on this. So, I am really looking for as much information as possible to determine exactly what skill sets I should look for in the programmers.
Also, please do not recommend getting a faster server. I am focused on optimizing the software end of this. Improvements to the physical server that are required for an improved software approach might be taken into consideration. However, I am trying to avoid simply pumping money into the hardware infrastructure to compensate for software inefficiency.
I recommand you to use Gearmand right now.
It's very easy to use with PHP with this extension http://php.net/manual/fr/book.gearman.php
Just set up a gearman server, and refactor your code to delegate all the processing to this server.
Your previous code can be refactored like that :
<?php
# Client Code
$client= new GearmanClient();
$client->addServer();
print $client->doBackground("action1", json_encode(array($username, $infoA)));
print $client->doBackground("action2", $infoB);
# Worker Code
$worker= new GearmanWorker();
$worker->addServer();
$worker->addFunction("action1", "some_function");
$worker->addFunction("action2", "another_function");
while ($worker->work());
function some_function($job)
{
list($username, $infoA) = json_decode($job->workload(), true);
// do the stuff ...
}
function another_function($job)
{
$infoB = $job->workload();
// do the stuff ...
}
精彩评论