开发者

How to find urls in images

I am trying to extract urls from a large number of google search results. Getting them from the source code is proving to be quite challenging as the delimiters are not clear and not all of the urls are in the code. Is there a tool that can extract urls from a certain area of an image? If so that may be 开发者_开发问答a better solution.

Any help would be much appreciated.


Try using the JSON/Atom Custom Search API instead: http://code.google.com/apis/customsearch/v1/overview.html. It gives you 100 api calls per day, something you can increase to 10000 per day, if you pay.


Use this excellent lib: http://simplehtmldom.sourceforge.net/manual.htm

// Grab the source code
$html = file_get_html('http://www.google.com/');

// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $ret->href;

EDit :

All "natural" search urls are in the #res div it seems.. With simplehtmldom find first #res, than all url inside of it. Don't remember exactly the syntax but it must be this way :

$ret = $html->find('div[id=res]')->find('a'); 

or maybe

$html->find('div[id=res] a');
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜