开发者

Check duplicate content without doing a GET

One of the main purposes of URL normalization is to avoid GET requests on distinct URLs that produce the exact same result.

Now, I know that you can check for the canonical tag and even compare two URL's HTML to see if they're the same, however you have to download the exact same resource twice in order to do this, beating the point I stated before.

Is there a way to check for duplicated content doing only a HEAD request? If not,开发者_运维技巧 is there a way to only download the <head> section of a web page without downloading the entire document?

I can think of solutions for the last one, I just wan't to know if there's a direct one.


According to the MSDN Documentation the solution for your question is as following

Dim myHttpWebRequest As HttpWebRequest = CType(WebRequest.Create(url), HttpWebRequest)
Dim myHttpWebResponse As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
Console.WriteLine(ControlChars.Lf + ControlChars.Cr + "The following headers were received in the response")
Dim i As Integer
While i < myHttpWebResponse.Headers.Count
    Console.WriteLine(ControlChars.Cr + "Header Name:{0}, Value :{1}", myHttpWebResponse.Headers.Keys(i), myHttpWebResponse.Headers(i))
    i = i + 1
End While
myHttpWebResponse.Close()

Let me explain this code First line Creates an HttpWebRequest with the specified URL and second line and third line Displays all the Headers present in the response received from the URI and Fourth to Eighth line - The Headers property is a WebHeaderCollection. Use it's properties to traverse the collection and display each header and tenth to close the request and if you want the only head portion of the Web Page then a PHP Class is freely available at http://www.phpclasses.org/package/4033-PHP-Extract-HTML-contained-in-tags-from-a-Web-page.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜