Tracking Excel files in Version Control
We are branching out beyond the development team and trying to get other groups within my company to use version control for important documents that need change tracking. One frequent need is for Excel spreadsheets. These are large spreadsheets, modified fairly frequently (weekly or monthly) but with only a small portion of the cells changed each time.
Just sticking the files in subversion (the particular tool we are using) gives a history of changes and keeps old versions. And the TortoiseSVN client makes it easy for non-technical users. Recent versions of TortoiseSVN even contain a script which can be used to perform nice visual diffs between Excel documents.
My remaining concern is disk space. These are large documents. The diffs between versions are small, bu开发者_如何学编程t I worry that the version control will notice that the file is binary and fall back to storing each version separately. Does anyone know of a solution to this? For instance, a format we could save in in which the diffs would be small so only differences would be saved, or a version control system which is specifically aware of Excel files? I have not yet done performance testing, but our version control server is already badly taxed and if there is a better solution I'd love to know what it is.
Currently SVN cannot efficently store those types of files. There has been some discussion about it though
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=651443
This SO question shows a graph when storing an OpenXML office document. The results were pretty linear
Will Subversion efficiently store OpenXML Office documents?
Although your question wasn't specifically about that format it may still apply. You might just need to run a test in SVN and see what kind of storage it takes. SVN is pretty good at storing binary files, so it might not be too terrible. The SO question above also mentions saving the file as a plain text XML 2003 document, which you might investigate also.
One consideration is using Team Foundation Server for source control (if that's an option), which will just store your delta changes, although it may be a bit heavy for what you're looking for.
From my understanding, binary vs. text doesn't have an impact on the storage size in SVN: http://help.collab.net/index.jsp?topic=/faq/svnbinary.html
精彩评论