Continuously download latest changes from a file
I have no idea how to get even started on this. I have little to no knowledge about downloading files, besides a tutorial I once followed which downloaded a simple text file.
Allow me to explain what kind of file I should be downloading. I have a game that records a local demo. This file keeps growing by adding so called frames. What we're trying to achieve is to get this file to download entirely, and once it is downloaded to only fetch the latest additions to this file, and not the entire file again. This allows us to playback the demo on a remote system while it is still being created.
We have succesfully done this using wget, but we开发者_如何学Python wanted to write a user friendly client around the download mechanism. So, what wget does is check if the file has changed, and then only fetches those last added bytes. The file grows about 40KBps. This way we can easily setup a stream to a remote system.
It's not an option to redownload the entire file all the time. We managed to check if the online file had changed or not, but when it detected a change it just downloaded the entire file. These files can grow up to 15Mb eventually, because of this size, we can't really provide a quick download and skip to the current frame in the game.
Sources, tutorials, or even just the download code with a small explanation how it works would help our project a lot.
Thanks in advance.
Simple implementation
- Do a HEAD request
- Get the content-length
- Use byte-range to request the new part of the file (by comparing local length and content-length - just like download managers resume feature)
- Append it to your local file
Done.
Assuming you are using HTTP requests to retrieve your file, the following should help you. I take advantage of both last modified and content length values and poll the file for changes. You may want to change that based on if you touch the file and don't make an update or if the ehader for any reason may change this isn't a great way to look for changes. However, this should get you moving in the right direction.
If you get really motivated, you could place the polling code in the thread I use in the program, make a "FileUpdatedEventArgs" class and pass changes back via events. -- Or maybe you just stick with polling it yourself. ;-)
public class SegmentedDownloader
{
#region Class Variables
/// <summary>
/// Date the file was last updated
/// Used to compare the header file for changes since
/// </summary>
protected DateTime LastModifiedSince = default(DateTime);
/// <summary>
/// Length of the file when it was last downlaoded
/// (this will be used to provide a content offset on next download)
/// </summary>
protected Int64 ContentLength = default(Int64);
/// <summary>
/// The file we're polling
/// </summary>
protected Uri FileLocation;
#endregion
#region Construct
/// <summary>
/// Create a new downloader pointing to the specific file location
/// </summary>
/// <param name="URL">URL of the file</param>
public SegmentedDownloader(String URL)
: this(new Uri(URL))
{
}
public SegmentedDownloader(Uri URL)
{
this.FileLocation = URL;
}
#endregion
/// <summary>
/// Grab the latests details from the page
/// </summary>
/// <returns>Stream with the changes</returns>
public Stream GetLatest()
{
Stream result = null;
try
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(this.FileLocation);
if (this.ContentLength > 0)
webRequest.AddRange(this.ContentLength);
webRequest.IfModifiedSince = this.LastModifiedSince;
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Int64 contentLength = webResponse.ContentLength;
DateTime lastModifiedSince = Convert.ToDateTime(webResponse.Headers["Last-Modified"]);
if (contentLength > 0 || lastModifiedSince.CompareTo(this.LastModifiedSince) > 0)
{
result = webResponse.GetResponseStream();
this.ContentLength += contentLength;
this.LastModifiedSince = lastModifiedSince;
}
}
//catch (System.Net.WebException wex)
//{ // 302 = unchanged
// Console.WriteLine("Unchanged");
//}
catch (Exception)
{
result = null;
}
return result;
}
}
Which could be used in such a fashion:
class Program
{
static TimeSpan updateInterval = TimeSpan.FromSeconds(5);
static Thread tWorker;
static ManualResetEvent tReset;
static void Main(string[] args)
{
tReset = new ManualResetEvent(false);
tWorker = new Thread(new ThreadStart(PollForUpdates));
tWorker.Start();
Console.Title = "Press ENTER to stop";
Console.ReadLine();
tReset.Set();
tWorker.Join();
}
static void PollForUpdates()
{
SegmentedDownloader segDL = new SegmentedDownloader("http://localhost/dataFile.txt");
do
{
Stream fileData = segDL.GetLatest();
if (fileData != null)
{
using (StreamReader fileReader = new StreamReader(fileData))
{
if (fileReader.Peek() > 0)
{
do
{
Console.WriteLine(fileReader.ReadLine());
}
while (!fileReader.EndOfStream);
}
}
}
}
while (!tReset.WaitOne(updateInterval));
}
}
First, don't think of it as a file. It sounds like what you're dealing with is a stream of modification packets over time. If the base file on the local machine is missing or out of date, then you have some additional downloading to do to establish the base, but once that is in place the code that will be used most of the time is receiving update frames and applying them to the base, either by appending or overlaying the original data.
It might be helpful to think of this like a version control system, or at least get familiar with version control terms and approach your solution using similar concepts. Things like: every revision has a unique signature (usually a hash or digest of the actual data), and there is an order associated with the signatures. If you can require that the client always obtain update frames in sequential order (don't allow the client to have frame 200 if it does not yet have frame 150-199), then the client's request to the server can simply be "I have xyz. What do I need to be current with latest?"
If your data changes happen quickly, especially if they are the result of multiple clients acting concurrently on the shared document, then using time signatures alone probably won't be unique enough to be reliable. A time signature + a digital hash of the content would probably be a good idea, and I believe that is what most version control systems use in some fashion.
It might also be worthwhile to consider using a version control system as your core implementation and build your server and client around that instead of writing your own. This will be particularly useful if you have any requirements for your app to allow users to "go back" to previous revisions or take the current document and make private changes that are not seen by everyone else (a private branch in version control terms).
Ok I've read some of the extra questions that were asked here: We are accessing the file over HTTP. It is pushed onto a webserver. The pushing mechanism is something we can deal with later. Right now we are using ugly batch files that launch a program and keeps checking for changes locally, but this is invisible for the user so a worry for later.
For the other 2 questions, this is roughly how it works. Game connected to a gameserver records a demo locally -> this demo gets uploaded to a webspace -> this demo should get downloaded to a local user who can then playback the demo.
精彩评论