Href URL matching, [duplicate]
Possible Duplicate:
Grabbing the href attribute of an A element
Im trying to match up in page source :
<a href="/download/blahbal.html">
I have looked at one other link on this site and used the regex :
'/<a href=["\']?(\/download\/[^"\'\s>]+)["\'\s>]?/i'
which returns all href l开发者_开发技巧inks on the page but it misses off the .html on some links.
Any help would be greatly appreciated.
Thank you
First use the method described here to retrieve all hrefs, then you can use a regex or strpos to "filter out" those who don't start with /download/.
The reason why you should use a parser instead of a regex is discussed in many other posts on stack overflow (see this). Once you parsed the document and got the hrefs you need, then you can filter them out with simple functions.
A little code:
$dom = new DOMDocument;
//html string contains your html
$dom->loadHTML($html);
//at the end of the procedure this will be populated with filtered hrefs
$hrefs = array();
foreach( $dom->getElementsByTagName('a') as $node ) {
//look for href attribute
if( $node->hasAttribute( 'href' ) ) {
$href = $node->getAttribute( 'href' );
// filter out hrefs which don't start with /download/
if( strpos( $href, "/download/" ) === 0 )
$hrefs[] = $href; // store href
}
}
精彩评论