开发者

fetch any user input html tag from html document live url using php regular expression

I want to fetch any meta, title, script, link tag that is available on HTML page, this is the program i write (not correct but will give idea for experts).

<?php
function get_tag($tag_name, $url)
{
    $content = file_get_contents($url);

    // this is not correct : regular expression please //
    preg_match_all($tag_name, $content, $matches);

    return $matches;
}

print_r(get_tag('title', 'http://stackoverflow.com'));

?>

Output shoul开发者_开发百科d come something like this :

Array
(
    [0] => title
    [1] => Stack Overflow
)

Thanks!!


function get_tags($tag, $url) {
//allow for improperly formatted html
libxml_use_internal_errors(true);
// Instantiate DOMDocument Class to parse html DOM
$xml = new DOMDocument();

// Load the file into the DOMDocument object
$xml->loadHTMLFile($url);

// Empty array to hold all links to return
$tags = array();

//Loop through all tags of the given type and store details in the array
foreach($xml->getElementsByTagName($tag) as $tag_found) {
      if ($tag_found->tagName == "meta")
      {
        $tags[] = array("meta_name" => $tag_found->getAttribute("name"), "meta_value" => $tag_found->getAttribute("content"));
      }
      else {
    $tags[] = array('tag' => $tag_found->tagName, 'text' => $tag_found->nodeValue);
     }
}

//Return the links
return $tags;
}

This answer will actually give you the name of the tag as your first array value rather than "array" and will also stop the warning.


Before using regex for parsing HTML, you want to read the first answer from this question.

Try with DOMDocument, like this:

<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>


Since these tags cannot be nested, parsing is not necessary.

#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

If you are using this with your function, you will have to write $tag_name instead "meta|title|script|link".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜