开发者

PHP using preg_match to get title from article

I am having a strange problem with preg_match. I am using a regular expression that grabs the title of an article, basically looks for the tag:

preg_match('#(\<title.*?\>)(\n*\r*.+\n*\r*)(\<\/t开发者_如何学编程itle.*?\>)#', $data, $matches)

When I print out the $matches array I get nothing. But when I try the same thing in a regular expression tester, it works fine. I have even tried putting in a string that would definitely match it in place of the $data variable, without any luck.

What am I doing wrong here?


If you still want to use regex and not DOM, here's what you can do:

if(preg_match("/<title>(.+)<\/title>/i", $data, $matches))
     print "The title is: $matches[1]";
else
     print "The page doesn't have a title tag";


Or you could use, you know, an HTML parser for HTML:

$dom = new domDocument;
$dom->loadHTML($HTML);

echo $dom->getElementsByTagName('title')->item(0)->nodeValue;


Works for me:

preg_match("/<title>(.*)<\/title>/is", $html, $matches);

From this source: https://gist.github.com/jeremiahlee/785770


You may need to backslash-quote your backslashes.

PHP's string parser removes one layer of backslashes, and then the regular-expression engine consumes another layer, so (for example) recognizing a backslash requires FOUR of them in the source code.

Beyond that, you might try taking advantage of the XML recognition stuff in PHP, or do less clever string handling. Usually when REGEXes break, it's because you're trying to be too clever with them. Consider looking only for the " and remove the whole title tag, and then strip whitespace out of the string, and VOILA! A title.

See also http://php.net/manual/en/book.simplexml.php


Try this

if (preg_match('%(<title.*?\b(?!\w))(\n*\r*.+\n*\r*)(\b(?=\w)/title.*?\b(?!\w))%', $data, $matches)) {
    $title = $matches[1];
} else {
    $title = "";
}


Like everyone else, this has the "use a parser, not regex" disclaimer. However, if you still want regex, look at this:

$string = "<title>I am a title</title>";
$regex = "!(<title[^>]*>)(.*)(</title>)!i";
preg_match($regex, $string, $matches);
print_r($matches);

//should output:
array(
    [1] => "<title>"
    [2] => "I am a title"
    [3] => "</title>"
)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜