开发者

regex help with getting tag content in PHP

so I have the code

function getTagContent($string, $tagname) {

    $pattern = "/<$tagname.*?>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);


    print_r($matches);

}

and开发者_StackOverflow then I call

$url = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
$html = file_get_contents($url);
getTagContent($html,"title");

but then it shows that there are no matches, while if you open the source of the url there clearly exist a title tag....

what did I do wrong?


try DOM

$url  = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
$doc  = new DOMDocument();
$dom  = $doc->loadHTMLFile($url);
$items = $doc->getElementsByTagName('title');
for ($i = 0; $i < $items->length; $i++)
{
  echo $items->item($i)->nodeValue . "\n";
}


The 'title' tag is not on the same line as its closing tag, so your preg_match doesn't find it.

In Perl, you can add a /s switch to make it slurp the whole input as though on one line: I forget whether preg_match will let you do so or not.

But this is just one of the reasons why parsing XML and variants with regexp is a bad idea.


Probably because the title is spread on multiple lines. You need to add the option s so that the dot will also match any line returns.

$pattern = "/<$tagname.*?>(.*)<\/$tagname>/s";


Have your php function getTagContent like this:

function getTagContent($string, $tagname) {
    $pattern = '/<'.$tagname.'[^>]*>(.*?)<\/'.$tagname.'>/is';
    preg_match($pattern, $string, $matches);
    print_r($matches);
}

It is important to use non-greedy match all .*? for matching text between start and end of tag and equally important is to use flags s for DOTALL (matches new line as well) and i for ignore case comparison.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜