开发者

Trying to access a web page behind a login in c# to do web scraping

Ok, I am trying to grab some data off a internal web site that is ran by a 3rd party. I am seem to be able to login via post method and get the required cookies, but when I attempt to access the page behind the login page, it doesn' work. It still returns the orginal login page Here is the code I am not sure what I am doing wrong.

public class FetchData
{
    public static void Main(string[] args)
    {
        StringBuilder sb = new StringBuilder();

        // used on each read operation
        byte[] buf = new byte[8192];
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create("url_1");
        //url behind website login
        HttpWebRequest request1 = (HttpWebRequest)WebRequest.Create("url_2");
        request.Method = "POST";
        request.CookieContainer = new CookieContainer();
        request1.CookieContainer = new CookieContainer();
        string postData = "__VIEWSTATE=%2FwEPDwUKMTE5MDg0MzIzM2QYAQUeX19Db250cm9sc1JlcX开发者_运维知识库VpcmVQb3N0QmFja0tleV9fFgEFCWltYl9sb2dpbgtoYQyQQGMGv%2FcyvjeVOFG%2FhKtH&__EVENTVALIDATION=%2FwEWBAKKxOr4DQK3u5vhBALH%2FIHIDwLy7cL8Avc7%2FoWPCUSNmf%2B6pyue9ytCp6Ki&txt_username=Name&imb_login.x=28&imb_login.y=1&txt_password=password";
        byte[] byteArray = Encoding.UTF8.GetBytes(postData);
        // Set the ContentType property of the WebRequest.
        request.ContentType = "application/x-www-form-urlencoded";
        // Set the ContentLength property of the WebRequest.
        request.ContentLength = byteArray.Length;
        request1.ContentLength = byteArray.Length;
        // Get the request stream.
        using (Stream dataStream = request.GetRequestStream())
        {
            dataStream.Write(byteArray, 0, byteArray.Length);
            dataStream.Close();
        }

        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        // Print the properties of each cookie.
        foreach (Cookie cook in response.Cookies)
        {
            Cookie oC = new Cookie();

            // Convert between the System.Net.Cookie to a System.Web.HttpCookie...
            oC.Domain = request.RequestUri.Host;
            oC.Expires = cook.Expires;
            oC.Name = cook.Name;
            oC.Path = cook.Path;
            oC.Secure = cook.Secure;
            oC.Value = cook.Value;

            request1.CookieContainer.Add(oC);
            Console.WriteLine(oC.ToString());
        }

        response.Close();
        response = (HttpWebResponse)request1.GetResponse();

        Stream resStream = response.GetResponseStream();

        string tempString = null;
        int count = 0;

        do
        {
            // fill the buffer with data
            count = resStream.Read(buf, 0, buf.Length);

            // make sure we read some data
            if (count != 0)
            {
                // translate from bytes to ASCII text
                tempString = Encoding.ASCII.GetString(buf, 0, count);

                // continue building the string
                sb.Append(tempString);
            }
        }
        while (count > 0); // any more data to read?

        // print out page source
        Console.WriteLine(sb.ToString());
        response.Close();

        Console.ReadLine();
    }
}


You should put the same CookieContainer into both requests.

However, your approach is flawed; ASP.Net ViewState and EventValidation cannot be replayed.

You need to request the original form, read the form elements (using the HTML Agility Pack), and build a POST from them.


At the risk of derailing your efforts, can I suggest instead using Selenium and calling it from c#? Since it creates a real Firefox browser and actually executes your web requests with that, it takes care of a lot of the low level plumbing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜