Extract doctype with simple_html_dom

2022-12-08 06:21 问答作者：

I am using simple_html_dom to parse a website. Is there a way to extract the doctype?

You can use file_get_contents function to get all HTML data from website. For example

<?php
   $html = file_get_contents("http://google.com");
   $html = str_replace("\n","",$html);
   $get_doctype = preg_match_all("/(<!DOCTYPE.+\">)<html/i",$html,$matches);
   $doctype = $matches[1][0];
?>

You can use $html->find('unknown'). This works - at least - in version 1.11 of the simplehtmldom library. I use it as follows:

function get_doctype($doc)
{
    $els = $doc->find('unknown');

    foreach ($els as $e => $el) 
        if ($el->parent()->tag == 'root') 
            return $el;

    return NULL;
}

That's just to handle any other 'unknown' elements which might be found; I'm assuming the first will be the doctype. You can explicitly inspect ->innertext if you want to ensure it starts with '!DOCTYPE ', though.

继续阅读：doctype php simple-html-dom

Extract doctype with simple_html_dom

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？