Best way to store multiple revisions of a text file to a single file
I'm working on a C# application that needs to store all the successive revisions of a given report f开发者_JAVA百科ile to a single project file: each time the (plain text) report file changes, the contents of the new version shall be appended to the project file, along with some metadata. Other requirements:
- each version of the report file is 100 kB to 1 MB. Theoritically, the maximum number of revisions is unlimited but it should be less than 1000 in practice.
- to keep things simple, I'd like to avoid computing differences between the revisions of the report - just store the whole report to the project file every time it has changed.
- the project file should be compressed - it doesn't need to be a text file
- it should be easy to retrieve a given version of the report from the application
How can I implement this in an efficient way? Should I create a custom binary file, consider using a database, other ideas?
Many thanks, Guy.
What's wrong with the simple workflow?
- Un-gzip file
- Append header and new report
- Gzip project file
Gzip is a standard format, so it's easily accessible. Subsequent reports probably won't change that much, so you'll have a great compression ratio. To file every report, just open the file and scan the headers. (If scanning doesn't work, also mirror the metadata in an SQLite database, and make sure to include offsets into the project file so you can seek to the right place quickly.)
If your requirements are flexible (e.g. that "shall append" part) and you just want something to keep track of past versions of the file, a revision control system will do all of what you need quite easily.
No need to implement that. I would suggest you to use source control. Personally I use subversion with TortoiseSVN client. There is also a plug-in that integrates Subversion with Visual Studio, VisualSVN. Have a look at them.
You could use alternate data streams to store the old revisions of your file. There is no built-in support in the .NET framework, but there exist some helper classes and articles like here and here.
I have never used this myself, so I can't really tell if this is a good option. But it seems, it would make an elegant solution, since you could store each file version in a separate data stream and only the current version in the "main file". In any case, it will probably only work on NTFS drives.
If using SVN is not an option, you can just store each revision in an individual file (with filename that represents date for example). You can use separate files for metadata as well. Then all the aforementioned files are zipped into one file (look at http://DotNetZip.codeplex.com/ for example).
I don't think there is much point building this yourself when there are already tens, if not hundreds, of systems that are basically designed to do exactly this - source control systems.
I'd recommend choosing some source control solution that has bindings to C# and store your document in there. Then you can easily check out any revision of the document. You will also be able to diff, branch, etc. if necessary.
To give just one example to get you started you can use Subversion with C# bindings.
I think that the already SVN (or another source control system) is a very good idea because source control seems to have exactly the features you require. But if that's not an option you could use a file database like SQL Server Compact Edition or SQLite.
精彩评论