Inconsistent POSTing between Web Browser and HttpWebRequest
I’m working on Web Scraping using C# HttpWebRequest/HttpWebResponse. For the most part this process has gon开发者_如何学运维e smoothly. But after POSTing my way through several pages, I have gotten stuck with what seems to be an inconsistency between testing with the Web Browser and the HttpWebRequest/HttpWebResponse calls.
The problem occurs when I land on a page containing an input element that has a name similar to this: “RidiculouslyLongInputName.RidiculouslyLongInputName.RidiculouslyLongInputName.@RidiculouslyLong”
POSTing a value for this input element causes a 500 error when using HttpWebRequest but works fine when POSTing through the browser. If I remove this input value from the post data the the HttpWebRequest will not get the 500 error. But then I'm stuck with a data validate issue from the website.
Any idea on why HttpWebRequest is failing?
It's times like these when packet sniffers come in extremely useful for seeing exactly what kind of data is flowing through and what the difference is.
http://www.wireshark.org/
Is a great tool for things like this.
Filter down to only the domains you're interested in, then send off the packet with HttpWebRequest. Save the packet data somewhere. Repeat but do the request through the browser. Check the difference.
If it is indeed an issue with POST variables, it should be evident in the HTTP payload.
Not sure why you are running into the problem, but I would recommend grabbing a copy of Fiddler and taking a look at what the browser is sending in the POST request. It is possible there is something less than obvious going on.
You can also use Firebug extension with Firefox. With this extension installed and enabled, go through the entire scenario in Firefox. FIrebug will tell you the exact request/response sent by the browser. You can then duplicate that as much as possible using HttpWebRequest
First thanks for MEF
response. That case was a personal mistake so I deleted the question.
I think best tool for your case is Fiddler but I guess there are other JavaScript attached to that button or something like that you are missing to mimic. WebRequest
cannot do that for you and WebBrowser
can do since it's working on DOM
.
In order to use WebRequest
correctly you highly need to reverse engineer every request by something like Fiddler
. It's very hard to find what's exactly going on by looking at the page's source (and it's referenced Javascripts/CSS...).
精彩评论