PHP using preg_match to get title from article
I am having a strange problem with preg_match. I am using a regular expression that grabs the title of an article, basically looks for the tag:
preg_match('#(\<title.*?\>)(\n*\r*.+\n*\r*)(\<\/t开发者_如何学编程itle.*?\>)#', $data, $matches)
When I print out the $matches array I get nothing. But when I try the same thing in a regular expression tester, it works fine. I have even tried putting in a string that would definitely match it in place of the $data variable, without any luck.
What am I doing wrong here?
If you still want to use regex
and not DOM
, here's what you can do:
if(preg_match("/<title>(.+)<\/title>/i", $data, $matches))
print "The title is: $matches[1]";
else
print "The page doesn't have a title tag";
Or you could use, you know, an HTML parser for HTML:
$dom = new domDocument;
$dom->loadHTML($HTML);
echo $dom->getElementsByTagName('title')->item(0)->nodeValue;
Works for me:
preg_match("/<title>(.*)<\/title>/is", $html, $matches);
From this source: https://gist.github.com/jeremiahlee/785770
You may need to backslash-quote your backslashes.
PHP's string parser removes one layer of backslashes, and then the regular-expression engine consumes another layer, so (for example) recognizing a backslash requires FOUR of them in the source code.
Beyond that, you might try taking advantage of the XML recognition stuff in PHP, or do less clever string handling. Usually when REGEXes break, it's because you're trying to be too clever with them. Consider looking only for the " and remove the whole title tag, and then strip whitespace out of the string, and VOILA! A title.
See also http://php.net/manual/en/book.simplexml.php
Try this
if (preg_match('%(<title.*?\b(?!\w))(\n*\r*.+\n*\r*)(\b(?=\w)/title.*?\b(?!\w))%', $data, $matches)) {
$title = $matches[1];
} else {
$title = "";
}
Like everyone else, this has the "use a parser, not regex" disclaimer. However, if you still want regex, look at this:
$string = "<title>I am a title</title>";
$regex = "!(<title[^>]*>)(.*)(</title>)!i";
preg_match($regex, $string, $matches);
print_r($matches);
//should output:
array(
[1] => "<title>"
[2] => "I am a title"
[3] => "</title>"
)
精彩评论