PHP5 webpage scan (simple DOM parser || file_get_contents()+regexp)... resources wise
I was thinking about a script that would scan 10+ websites for specific content inside a specific div
. Let's say it would be moderately used, some 400 searches a day.
Which of the t开发者_StackOverflow社区wo in the title would support better the load, take less resources and give better speeds:
Creating the DOM from each of the websites then iterating each for specific div id
OR
creating a string from the website with file_get_contents
,
and then regexping the needed string.
To be more specific of what kind of operation I would need to execute hear the following,
Additional question: Is regexp capable of searching the following occurrence of the given string:
<div id="myId"> needed string </div>
to identify the tag with the given ID and return ONLY what is between tags?
Please answer only yes/no, if it's possible, I'll open a separate question about syntax so it's not all bundled here.
For 400 searches a day, which method you use is rather indifferent, performance-wise.
In any case, the fastest method would be file_get_contents
+ strpos
+ substr
, unless your location+extraction algorithm is complex enough. Depending on the specific regular expression it may or may not be faster than DOM, but it likely is. DOM will probably be a more reliable method than regular expressions, but than depends on the level of well-formedness of your pages (libxml2 does not exactly mimic the browsers' parsing).
Yes
Speed will depend on your server and the pages in question; both ways execution time will be negligible comparing to the time of downloading the pages to scan.
if you go with DOM / XPath, the thing is doable in 3 lines of code.
精彩评论