Should I use Keep-Alive when screen-scraping?
Is it recommended to work with persistent connections when screen-scraping? What are the possible advantages/disadvant开发者_如何学编程ages?
I'm using PHP/cURL to scrape.
It won't make that much of a difference. The real performance decision you need to make is concurrent scraping. Because, persistent or not, a single connection can only process 1 request/response at a time.
Which brings me to my next point:
I'm using PHP/cURL to scrape.
PHP is probably the wrong tool for this job. It's not very good at concurrency. Or, at least, the default build isn't.
精彩评论