开发者

How to limit concurrent connections used by cURL

I made a simple web crawler using PHP (and cURL). It parses rougly 60 000 html pages and retreive product information (it's a tool on an intranet).

My main concern is the concurrent connection. I would like to limit the number of connection, so whatever happens, the crawler would never use more than 15 concurrent connections.

The server block the IP whenever the limit of 25 concurrent connections by IP is reached and for some reason, I can't change that on the server side, so I have to find a way to make my script never use more than X concurrent connections.

Is this possible?

Or maybe I should rewrite the whole开发者_运维知识库 thing in another language?

Thank you, any help is appreciated!


well you can use curl_set_opt(CURLOPT_MAXCONNECTS, 15); to limit the number of connections. But you might also want to make a simple connection manager if that doesnt do it for you.


Maybe write a simple connection table:

target_IP           |   active_connections

1.2.3.4                 10
4.5.6.7                 5

each curL call would increase the number of connections, each close decrease it.

You could store the table in a mySQL table, or a Memcache for speed.

When you encounter a IP that already has its maximum connections, you would have to implement a "try later" queue.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜