Extracting string between <title> and </title> using PHP [duplicate]
Possible Duplicates:
(PHP5) Extracting a title tag and RSS feed address from HTML using PHP DOM or Regex Grabbing title of a website using DOM
I am trying to run through a hundred different html files on my server, and extract the titles for use in another php file.
For reference:
<title>Generic Test Page</title>
What I need is a function that will return the string "Generic Test Page" and stick that into a global variable.
What I am doing right now is simply reading the file into an array called $lines. Foreach $lines as $line, I am testing for the string < title> ... but how do I extract only what's between the > and < /title?
My trouble is that sometimes the original developer decided to elaborate on the title: < title name=title cl开发者_开发知识库ass=title1>, or he put it on three lines instead of one. What in the world? So I can't just strip the first seven characters and the last eight characters. Which would be so nice...
Thank you!!
You need to use something like PHP Simple Dom Parser
function get_page_title($html_file) {
$html = file_get_html($html_file);
$title = $html->find('title', 0)->plaintext;
return $title;
}
$line = each line.
$pattern ='/<title[^>]*>(.*?)<\/title>/is';
if( preg_match($pattern,$line,$match) )
return trim($match[1]); # your title !
or just use the pattern on the whole html and return the match.
or use something scurker has suggested.
You should use a regular expression to extract the inner part. More info here
精彩评论