Managing documents using GIT
I am working on a website where I will be able to create project and upload data to each of my products. The data could be mostly in the form of spreadsheet docs, images, pdfs etc. Ideally, I would like to use a VCS (git pref) kind of setup where each time I update a particular document, I could just commit that document to a repo. Any开发者_StackOverflow ideas on how I could go about implementing will be helpful.
You can call git in a subshell after each upload.
But I don't think using any VCS it's good solution for document versioning, especially in web application. This is because with office-like documents you will use mostly binary data. VCS sucks (no exceptions) when comes to binary data. You will not be able to do any diff, and metadata management is not suited for such things - author of commit is mostly bounded to particular account (and you will be using probably one system account for git), no additional information (except base file information: size, permissions, ctime) is stored, so you will have to store it (authorship, permissions for web application users, additional meta-data) some near by by yourself. Also note that several users can commit data at the same time, so there will be branches in your versioning. When you will have huge dataset (and with binary office files it can come quicker than you think), you will not be able to partition such repository.
IMO, using VCS here gives you very small gain and introduces additional problems.
I'd advice keeping metadata in database (file name, revisions, additional stuff), and keep file revisions on disk. Keep each file with revisions in separate, unique dir. One tip here: don't use file names that comes from upload. Use hash functions to calculate unique name based on content and metadata.
There isn't an universal "commit on save" feature (at least one integrated with all the editors associated with the document types you mention)
The easiest way would be a background job which would commit (or 'git add -A && git commit -m "xxx"
in the case of Git) every 5 minutes for instance.
Actually, Mark Longair comments:
flashbake is designed to be run from cron to do what you describe in the second paragraph with some kind of reasonable commit message.
I'm not sure that that's what the original poster is after, though.
Original project here:
- Automated backup is nice unless you have files for which you want to view an incremental history.
- Source control is great for that history but most tools expect the author to manually commit their changes along the way.
- => A seamless source control solution combines the convenience of automated back up with the power of source version control.
As a branch off of Cezio's answer, if you would really like to use a VCS for version control, consider LaTeX. Since it is essentially source code that is compiled into a document (usually PDF via pdflatex
), it's a reasonable candidate for version control.
精彩评论