HttpWebRequest versus browser request
I used to retrieve data from a site using a c# program.(nseindia.com) however recently NSE made some changes so that any request from any program is responded with a “403 Forbidden Error”. Can anyone tell me a way to make the request from the program identical to that from the browser. I tried setting the userAgent property but thats not working. The code is pasted below.
string DownloadData(string CompanyName)
{
string address = string.Format(@"http://www.nseindia.com");
//http://www.nseindia.com/marketinfo/sym_map/symbolMapping.jsp?dataType=priceVolumeDeliverable&symbol=abb&
//http://www.nseindia.com/content/equities/scripvol/datafiles/01-12-2008-TO-29-12-2010ABBALLN.csv
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(address);
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3
string strData = "";
try
{
request.Proxy = WebProxy.GetDefaultProxy();
HttpWebResponse response = (HttpWebRespons开发者_运维问答e)request.GetResponse();
System.IO.Stream stream = response.GetResponseStream();
System.Text.Encoding ec = System.Text.Encoding.GetEncoding("utf-8");
System.IO.StreamReader reader = new System.IO.StreamReader(stream, ec);
strData = reader.ReadToEnd();
if (strData.Contains("Error"))
{
Exception e = new Exception(strData);
throw e;
}
}
catch(Exception e)
{
Console.WriteLine(e.ToString());
}
return strData;
}
Your best bet is to spy your browser to see exactly the requests sent and responses received.
There is numerous addins for that, depending on your browser.
Try setting the Accept
HTTP header; e.g.:
request.Accept = "Accept: text/html,application/xhtml+xml,application/xml";
I arrived at this suggestion by running Fiddler2 (as suggested in a comment to another answer) in order to see how my browser (Firefox 4 Beta) makes the HTTP request to the website you mentioned.
I then set all headers in the code and eliminated one by one. As soon as I removed the Accept
header, the 403
status code was returned.
Exact request made by my browser:
GET / HTTP/1.0
Host: www.nseindia.com
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.0b8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
PS: The other URIs you mention in the comments seem to be invalid. One is incomplete and yields a 500 Internal Server Error
, the other yields a 404 Not Found
response.
Try to set credentials as default like this
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
or
NetworkCredential nc = new NetworkCredential("user", "password");
request.Credentials = nc;
if you need username password to access that web page
or an another option is to use WebBrowser
control ;)
精彩评论