开发者

Problem with Httpwebrequest (503)

I am using HttpWebrequest to GET the result from google.I use proxies to get the data.now there is a strange problem that for some queries it return the data and for some it throws the exception The remote server returned an error: (503) Server Unavailable.. One might think that proxy is bad but when you put it in internet explorer then you open google it is there.no 503 error then.but httpwebrequest gives it on certain query.i.e if you intend to get

http://www.google.com/search?q=site:http://www.yahoo.com 

it would throw exception where as if you go for

http://www.google.com/search?q=info:http://www.yahoo.com

it works.

my code so开发者_开发问答 far is

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(file);
                request.ProtocolVersion = HttpVersion.Version11;
                request.Method = "GET";
               request.KeepAlive = false;
                request.ContentType = "text/html";
                request.Timeout = 1000000000;
                request.ReadWriteTimeout = 1000000000;
                request.UseDefaultCredentials = true;
                request.Credentials = CredentialCache.DefaultCredentials;
    Uri newUri = new Uri("http://" + proxy[selectedProxy].ProxyAddress.Trim() + "/");
                    WebProxy myProxy = new WebProxy();
                    myProxy.Credentials = CredentialCache.DefaultCredentials;
                    myProxy.Address = newUri;
                    request.Proxy = myProxy;
 WebResponse response = request.GetResponse();
                    // System.Threading.Thread.Sleep(Delay);
                    StreamReader reader = null;
                    string data = null;
                    reader = new StreamReader(response.GetResponseStream());
                        data = reader.ReadToEnd();


You are being hit with the "sorry you are a spambot message" and will need to enter the captcha to continue or to change proxy. For some reason you cannot pull the page contents by default when you get a 503 error, although if you do the same thing in the browser, the contents will be display to you.


That's weird. Maybe some url encoding issue. Try the following which should take care of properly handling everything:

using System;
using System.Net;
using System.Web;

class Program
{
    static void Main()
    {
        using (var client = new WebClient())
        {
            var newUri = new Uri("http://proxy.foo.com/");
            var myProxy = new WebProxy();
            myProxy.Credentials = CredentialCache.DefaultCredentials;
            myProxy.Address = newUri;
            client.Proxy = myProxy;

            var query = HttpUtility.ParseQueryString(string.Empty);
            query["q"] = "info:http://www.yahoo.com";
            var url = new UriBuilder("http://www.google.com/search");
            url.Query = query.ToString();
            Console.WriteLine(client.DownloadString(url.ToString()));
        }
    }
}


It depends on how often you send a query to Google with the same IP address. If you send your queries to Google too fast, then Google will block you IP address. When this happens, Google returns a 503 error with a redirect to their sorry-page.

Do something like this:

try
            {
                response = (HttpWebResponse) webRequest.GetResponse();
            }
            catch (WebException ex)
            {
                using (var sr = new StreamReader(ex.Response.GetResponseStream()))
                {
                    var html = sr.ReadToEnd();
                }
            }

And when debugging, check for the value that's in the html variable. You will see that this is an HTML-page where you should fill in a captcha code

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜