Check duplicate content without doing a GET
One of the main purposes of URL normalization is to avoid GET
requests on distinct URLs that produce the exact same result.
Now, I know that you can check for the canonical tag
and even compare two URL's HTML to see if they're the same, however you have to download the exact same resource twice in order to do this, beating the point I stated before.
Is there a way to check for duplicated content doing only a HEAD request? If not,开发者_运维技巧 is there a way to only download the <head>
section of a web page without downloading the entire document?
I can think of solutions for the last one, I just wan't to know if there's a direct one.
According to the MSDN Documentation the solution for your question is as following
Dim myHttpWebRequest As HttpWebRequest = CType(WebRequest.Create(url), HttpWebRequest)
Dim myHttpWebResponse As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
Console.WriteLine(ControlChars.Lf + ControlChars.Cr + "The following headers were received in the response")
Dim i As Integer
While i < myHttpWebResponse.Headers.Count
Console.WriteLine(ControlChars.Cr + "Header Name:{0}, Value :{1}", myHttpWebResponse.Headers.Keys(i), myHttpWebResponse.Headers(i))
i = i + 1
End While
myHttpWebResponse.Close()
Let me explain this code First line Creates an HttpWebRequest with the specified URL and second line and third line Displays all the Headers present in the response received from the URI and Fourth to Eighth line - The Headers property is a WebHeaderCollection. Use it's properties to traverse the collection and display each header and tenth to close the request and if you want the only head portion of the Web Page then a PHP Class is freely available at http://www.phpclasses.org/package/4033-PHP-Extract-HTML-contained-in-tags-from-a-Web-page.html
精彩评论