Best practice -- Content Tracking Remote Data (cURL, file_get_contents, cron, et. al)?
I am attempting to build a script that will log data that changes every 1 second. The initial thought was "Just run a php file that does a cURL every second from cron" -- but I have a very strong feeling that this isn't the right way to go about it.
Here are my specifications: There are currently 10 sites I need to gather data from and log to a database -- this number will invariably increase over time, so the solution needs to be scalable. Each site has data that it spits out to a URL every second, but only keeps 10 lines on the page, and they can sometimes spit out up to 10 lines each time, so I need to pick up that data every second to ensure I get all the data.
As I will also be writing this d开发者_如何转开发ata to my own DB, there's going to be I/O every second of every day for a considerably long time.
Barring magic, what is the most efficient way to achieve this?
it might help to know that the data that I am getting every second is very small, under 500bytes.
The most efficient way is to NOT use cron, but instead make an app that just always runs and keep curl handles open and repeats the request every second. This way, they will keep the connection almost forever and the repeated requests will be very fast.
However, if the target servers aren't yours or your friends, there's a likeliness that they will not appreciate your hammering on them.
精彩评论