Download PDFs through proxy
I have a list of URLs linking directly to PDFs on a database website. It would be very easy to automate the download process, except for the fact that I have to access the website through a proxy server. The code I've been trying to use has been this:
public void Download()
{
WebClient wb2 = new WebClient();
WebProxy proxy = new WebProxy("PROXY_URL:port", true);
proxy.Credentials = new NetworkCredential("USERNAME", "PASSWORD");
GlobalProxySelection.Select = proxy;
try
{
for(int i = 0; i < URLList.Length; i++)
{
byte[] Data = DownloadData(URLList[i]);
FileSt开发者_如何学编程ream fs = new FileStream(@"D:\Files\" + i.toString() + ".pdf", FileMode.Create)
fs.Write(Data, 0, Data.Length);
fs.Close();
}
}
catch(WebException WebEx)
{
MessageBox.Show(WebEx.Message);
}
}
public byte[] DownloadData(string path)
{
WebClient wb2 = new WebClient();
wb2.Credentials = new NetworkCredential("USERNAME","PASSWORD");
return wb2.DownloadData(path);
}
For some reason, it returns the error "(400): Bad Request" every time. I'm obviously able to get to these PDFs just fine through Firefox, so I'm wondering what I'm doing wrong here. I'm fairly new to programming in general, and very new to web protocols through C#. Any help would be appreciated.
use fiddler to work out the difference between the request your code is sending vs the one via your browser.
the 400 error is due to a malformed request; opposed to the proxy denying you (407) or the site requiring authentication (401).
Incidently, the line "wb2.Credentials = ..." is providing your username/password to the target server. is this intended?
Haven't used WebClient for a while, but you can use var request = HttpWebRequest.Create(); request.Proxy = proxy; request.GetResponse().GetResponseStream() and read the bytes using BinaryReader().
That will give you the byte array that you can write to a file using File.WriteAllBytes() rather than having to use a FileStream.
hth
精彩评论