Extracting specific <a href> URLs out of the document
I think this should be elementary, but I still can't get my head around it. Let's say there's fair amount of HTML documents and I need to catch every image URLs 开发者_运维技巧out of them.
The rest of the content changes, but the base of the url is always the same for example: http://images.examplesite.com/images/
,
So I want to extract every string that contains that part. the problem is that they're always mixed with <a href=''>
or <img src=''>
tags, so how could I drop them out? preg_match
probably?
Try something like: preg_match_all('/http:\/\/images\.examplesite\.com\/images\/(.*?)"/i', $html_data, $results, PREG_SET_ORDER)
You can either use html dom parser
or use regular expression.
preg_match_all("/http:\/\/images.examplesite.com\/images\/(.*?)\"/s", $str, $preg);
print_r($preg);
精彩评论