开发者

php Regex remove text from rel attribute within links

is there a really easy way to grab the text of a rel attribute i.e

<a href='#' rel='i want this text here'></a>.

I hav开发者_StackOverflow中文版e tried this morning with regex but am having no luck.


Do not use regular expressions for irregular languages like HTML. You can achieve that using XPath. Example:

$doc = new DOMDocument();
$doc->loadHtml($htmlAsString);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//a[@rel]');


Unless the HTML is 100% static and controlled by you, I recommend you use a HTML parser like one of the built-in ones like DOMDocument, or the PHP Simple HTML DOM Parser. It's more effort to set up than a simple Regex, but it will work much more reliably in all cases and variations.

 <a href='#' rel="i want this text here"></a>
 <a href='#' REL="i want this text here"></a>
 <a rEL='i want this text here' href='#' ></a>


This should work:

preg_match_all('%<a[^>]+rel=("([^"]+)"|\'([^\']+)\')[^>]*>%i', $html, $matches);
print_r($matches);


As said by others, you should avoid using regex for parsing HTML as its not regular. But if you are sure that the structure of the HTML you can use the regex. The following program will extract the stuff you want:

<?php
$a = "<a href='#' rel='i want this text here'></a>";

if(preg_match("{<a href.*?rel='(.*?)'.*?>}",$a,$matches)) {
        echo $matches[1]; // prints i want this text here
}
?>


Like the other posters have pointed out: It's really a bad idea to use regex for html parsing, to many things can go wrong and you'll need to do more support. ( See Pekka's comment !)

To add some value here i postet a full example of getting every rel attribute:

<?php
$html = "<a href='#' rel='i want this text here'></a>";

$dom = new DomDocument();
$dom->loadHtml($html);

$xpath = new DomXPath($dom);
$refAttributes = $xpath->query("//a[@rel]");
// ^^ This means: Get my every <a...></a> that has a rel attribute

foreach($refAttributes as $refAtt) {
    var_dump($refAtt->getAttribute("rel"));
}

And for additional reading one can try:

http://kore-nordmann.de/blog/do_NOT_parse_using_regexp.html

http://kore-nordmann.de/blog/0081_parse_html_extract_data_from_html.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜