Trying to access a web page behind a login in c# to do web scraping
Ok, I am trying to grab some data off a internal web site that is ran by a 3rd party. I am seem to be able to login via post method and get the required cookies, but when I attempt to access the page behind the login page, it doesn' work. It still returns the orginal login page Here is the code I am not sure what I am doing wrong.
public class FetchData
{
public static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
// used on each read operation
byte[] buf = new byte[8192];
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("url_1");
//url behind website login
HttpWebRequest request1 = (HttpWebRequest)WebRequest.Create("url_2");
request.Method = "POST";
request.CookieContainer = new CookieContainer();
request1.CookieContainer = new CookieContainer();
string postData = "__VIEWSTATE=%2FwEPDwUKMTE5MDg0MzIzM2QYAQUeX19Db250cm9sc1JlcX开发者_运维知识库VpcmVQb3N0QmFja0tleV9fFgEFCWltYl9sb2dpbgtoYQyQQGMGv%2FcyvjeVOFG%2FhKtH&__EVENTVALIDATION=%2FwEWBAKKxOr4DQK3u5vhBALH%2FIHIDwLy7cL8Avc7%2FoWPCUSNmf%2B6pyue9ytCp6Ki&txt_username=Name&imb_login.x=28&imb_login.y=1&txt_password=password";
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
// Set the ContentType property of the WebRequest.
request.ContentType = "application/x-www-form-urlencoded";
// Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length;
request1.ContentLength = byteArray.Length;
// Get the request stream.
using (Stream dataStream = request.GetRequestStream())
{
dataStream.Write(byteArray, 0, byteArray.Length);
dataStream.Close();
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Print the properties of each cookie.
foreach (Cookie cook in response.Cookies)
{
Cookie oC = new Cookie();
// Convert between the System.Net.Cookie to a System.Web.HttpCookie...
oC.Domain = request.RequestUri.Host;
oC.Expires = cook.Expires;
oC.Name = cook.Name;
oC.Path = cook.Path;
oC.Secure = cook.Secure;
oC.Value = cook.Value;
request1.CookieContainer.Add(oC);
Console.WriteLine(oC.ToString());
}
response.Close();
response = (HttpWebResponse)request1.GetResponse();
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
}
while (count > 0); // any more data to read?
// print out page source
Console.WriteLine(sb.ToString());
response.Close();
Console.ReadLine();
}
}
You should put the same CookieContainer
into both requests.
However, your approach is flawed; ASP.Net ViewState and EventValidation cannot be replayed.
You need to request the original form, read the form elements (using the HTML Agility Pack), and build a POST from them.
At the risk of derailing your efforts, can I suggest instead using Selenium and calling it from c#? Since it creates a real Firefox browser and actually executes your web requests with that, it takes care of a lot of the low level plumbing.
精彩评论