开发者

How do retrieve a URL protocol ("http" or "https")?

I am using the PHP lib Simple HTML Dom Parser, as suggested here ( How do you parse and process HTML/XML in PHP? ) to parse a webpage's html content.

To create the DOM, I have to do:

$html = file_get_html('http://www.example.com/');

The problem is that if I do:

$html = file_get_html('www.example.com');

without specifying the URL's protocol, I will get an error.

My question is: How can I get to know if the URL with the protocol is "http://www.example.com/" or "https://www开发者_运维知识库.example.com/" having in hands only the string "www.example.com"?


I can't figure out something smarter than assuming "http://" as default and, if it fails, try "https://"

if (!$html = file_get_html('http://' . $url)) $html = file_get_html('https://' . $url);


There is no way to know because both could be valid. I would assume http:// though because normal practice is to redirect http to https if it is required, and file_get_html should follow an HTTP 301 or 302 redirect.


You could try to use get_headers() on the http address and look for the Upgrade: request in the header. If you get a valid response, use http. Otherwise, try on https.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜