Exception when downloading data from HTTPS site
I am working on a siteripper / screenscraper for looking up tracking information on the Royal Mail website. Unfortunately Royal Mail do not support an API, so this is the way to do it.
I keep getting the same exception no matter what I do. (The remote server returned an error: (500) Internal Server Error.)
My base code is:
class Program
{
static void Main(string[] args)
{
string url = "http://track.royalmail.com/portal/rm/track?catId=22700601&gear=authentication&forcesegment=SG-Personal";
byte[] response;
WebClient webClient = new WebClient();
response = webClient.DownloadData(url);
}
}
I have used Fiddler, to investigate the data transactions made by my browser in order to mimic that in my code. I can see Royal Mail uses cookies, so I have tried to implement a WebClient that supports cookies by adding a cookie handler to it:
public class CookieAwareWebClient : WebClie开发者_如何学Cnt
{
private CookieContainer m_container = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = m_container;
}
return request;
}
}
But that didn't help eather :-(
I have also tried to look up the tracking information through Royal Mails SSL protected site (https://www.royalmail.com/portal/sme/track?catId=62200738&mediaId=63900708), and implementing credentials into my C# program, but no luck there.
I have now meet the wall, and I keep bumping into the same tutorials / threads that don't seem to help me any further.
I hope there is a brilliant brain out there :-)
If you send all the headers you should stop getting the 500 error
string url = "http://track.royalmail.com/portal/rm/trackresults?catId=22700601&pageId=trt_rmresultspage&keyname=track_blank&_requestid=17931";
using(WebClient webClient = new WebClient()) {
webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)";
webClient.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
webClient.Headers["Accept-Language"] = "en-us,en;q=0.5";
webClient.Headers["Accept-Encoding"] = " gzip,deflate";
webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
byte[] response = webClient.DownloadData(url);
}
精彩评论