How to query data from a password protected https website
I'd like my application to query a csv file from a secure website. I have no experience with web programming so I'd appreciate detailed instructions. Currently I have the user login to the site, manually query the csv, and have my application load the file locally. I'd like to automate this by having the user enter his login information, authenticating him on the website, and querying the data. The application is written in C# .NET.
I've tested the following code already and am able to access the file once the user has already authenticated himself and created a manual query.
System.Net.WebClient Client = new WebClient();
Stream strm = Client.OpenRead("https://<URL>/file.csv");
Here is the request sent to the site for authentication. I've angle bracketed the real userid and password.
POST /pwdVal.asp HTTP/1.1
Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET 开发者_StackOverflowCLR 3.0.30729; Media Center PC 6.0; InfoPath.2; Tablet PC 2.0; OfficeLiveConnector.1.4; OfficeLivePatch.1.3; .NET4.0C; .NET4.0E)
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Cookie: ASPSESSIONID<unsure if this data contained password info so removed>; ClientId=<username>
Host: www3.emidas.com
Content-Length: 36
Connection: Keep-Alive
Cache-Control: no-cache
Accept-Language: en-US
client_id=<username>&password=<password>
Most likely the server sends a cookie once login is performed. You need to submit the same values as the login form. (this can be done using UploadValues()) However, you need to save the resulting cookies in a CookieContainer.
When I did this, I did it using HttpWebRequest, however per http://couldbedone.blogspot.com/2007/08/webclient-handling-cookies.html you can subclass WebClient and override the GetWebRequest() method to make it support cookies.
Oh, also, I found it useful to use Fiddler while manually accessing the web site to see what actually gets sent back and forth to the web site, so I knew what I was trying to reproduce.
edit, elaboration requested: I can only elaborate how to do it using HttpWebRequest, I have not done it using WebClient. Below is the code snippet I used for login.
private CookieContainer _jar = new CookieContainer();
private string _password;
private string _userid;
private string _url;
private string _userAgent;
...
string responseData;
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(_url);
webRequest.CookieContainer = _jar;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.UserAgent = _userAgent;
string requestBody = String.Format(
"client_id={0}&password={1}", _userid, _password);
try
{
using (StreamWriter requestWriter = new StreamWriter(webRequest.GetRequestStream()))
{
requestWriter.Write(requestBody);
requestWriter.Close();
using (HttpWebResponse res = (HttpWebResponse)webRequest.GetResponse())
{
using (StreamReader responseReader = new StreamReader(res.GetResponseStream()))
{
responseData = responseReader.ReadToEnd();
responseReader.Close();
if (res.StatusCode != HttpStatusCode.OK)
throw new WebException("Logon failed", null, WebExceptionStatus.Success, res);
}
}
}
Before you go down this rabbit hole, contact the web site and ask them if they provide a web service to query user account info from. The simulated login method you are proposing should be a last resort only.
Another way you can do it is to automate IE, e.g. use a WebBrowser control. That will more accurately simulate all the clever stuff that IE does like running Javascript, which might be necessary. Although if Javascript or other clever stuff isn't necessary then using IE is a little heavy-handed and possibly prone to other problems.
精彩评论