WebRequest "HEAD" light weight alternative

2023-02-18 04:41 问答作者：

I recently discovered that the following does not work with certain sites, such as IMDB.com.

class Program
    {
        static void Main(string[] args)
        {
            try
            {
            开发者_如何学Python    System.Net.WebRequest wc = System.Net.WebRequest.Create("http://www.imdb.com"); //args[0]);

                ((HttpWebRequest)wc).UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19";
                wc.Timeout = 1000;
                wc.Method = "HEAD";
                WebResponse res = wc.GetResponse();
                var streamReader = new System.IO.StreamReader(res.GetResponseStream());

                Console.WriteLine(streamReader.ReadToEnd());
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }
    }

It returns an HTTP 405 ( Method Not Allowed ). My problem is, I use code very similar to the above to check if a link is valid and the vast majority of times it works correctly. I can switch it to method equal GET and it works ( with an increase in timeout ), but this slows things down by an order of magnitude. I am assuming the 405 response is a server configuration on IMDB's server side.

Is there a way for me to do the same thing as above, in a light weight manner in .NET? Or, is there a way to fix the above code so it works as a GET request that works with imdb?

Open the connection yourself with a socket (instead of an HttpRequest or WebClient), and close the stream as soon as you've read the status code. Fortunately the status code comes near the top of the response stream :)

You'll have to clarify what you mean by "lightweight". What are you trying to accomplish?

Whether or not you can use GET/POST/HEAD/DELETE/etc will depend on the URL and what's configured in the application that is running on the server at that URL.

If all you're trying to do is see if you can make a connection without actually downloading the content you could maybe try just initiating a connection to port 80 using sockets, but there isn't really reliable or universally supported way just by changing the HTTP method.

If HEAD returns a 405, that means the server doesn't support HEAD (at least for that URL) and you'll have fall back to GET instead. The majority of sites should support HEAD, so you probably want to do HEAD by default, but if it throws a 405, you could maybe fall back to GET for that domain. Or maybe you want to try HEAD first for each request; YMMV.

If the server requires GET and you want to reduce network traffic, you could try doing a conditional GET and/or a partial GET (see e.g. RFC2616). I've never tried doing those with WebRequest but I think it lets you add custom outgoing HTTP headers, so you should be able to do it.

Also, don't forget that, if you're writing a spider (which you clearly are), you should respect the server's robots.txt, and it's also courteous to throttle your requests to something like one request every two seconds, so you don't slashdot the server.

继续阅读：.net http-status-code-405 webrequest

WebRequest "HEAD" light weight alternative

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？