Downloading PDF content from a website -
I'm trying to download a PDF to my desktop - The PDF upda开发者_如何学Pythontes about every couple days with new content, and I was trying to see if there is a way to have the PDF automatically update its self when it has fresh content without having to go to the actual link.
-- http://www.uakron.edu/dotAsset/1265971.pdf
Assuming this is even remotely a programming question, you could try a HTTP HEAD query (ideally sending a If-Modified-Since header in your request), and inspect the response headers - if the server is friendly, it'll tell you whether it hasn't been updated via a 304 response code.
If you don't get a 304, then issue a GET request and save the response stream.
You could also just try issuing a GET with last-modified (skipping the HEAD); but a HEAD request might save some bandwidth if the server isn't entirely happy with just a GET / 304.
Not tested extensively, but:
using System;
using System.IO;
using System.Net;
static class Program
{
static void Main()
{
string url = "http://www.uakron.edu/dotAsset/1265971.pdf", localPath = "1265971.pdf";
var req = (HttpWebRequest)WebRequest.Create(url);
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
req.Headers.Add("Accept-Encoding","gzip,deflate");
if(File.Exists(localPath))
req.IfModifiedSince = File.GetLastWriteTimeUtc(localPath);
try
{
using (var resp = req.GetResponse())
{
int len;
checked
{
len = (int)resp.ContentLength;
}
using (var file = File.Create(localPath))
using (var data = resp.GetResponseStream())
{
byte[] buffer = new byte[4 * 1024];
int bytesRead;
while (len > 0 && (bytesRead = data.Read(buffer, 0, Math.Min(len, buffer.Length))) > 0)
{
len -= bytesRead;
file.Write(buffer, 0, bytesRead);
}
}
}
Console.WriteLine("New version downloaded");
}
catch (WebException ex)
{
if (ex.Response == null || ex.Status != WebExceptionStatus.ProtocolError)
throw;
Console.WriteLine("Not updated");
}
}
}
精彩评论