Version control for reports (git)
I have a particular report that I am asked to run from time to time. The details are slightly different each time - different date ranges, different selection criteria - but structurally, the report is fairly stable. I do make some structural changes from time to time, however.
I have two hopes for these reports:
- To be able to reproduce any report at a later date.
- To be able to review the structural changes made to the report over time.
Right now, I just have a folder with a master script, which I modify for every iteration of the report, and subfolders where I save a snapshot of the master script and the data for each run.
Maybe that's good enough. But I've started using git to manage my (much more complex) data analysis scripts, and I was wondering if there was a way to use it here (and for myriad similar reports) that would allow for more robust version control.
I can think of a few different ways to do so: make a branch for each report, but only merge structural changes back onto the master; clone the master into the subfolder for a new report, make changes th开发者_如何学JAVAere, push back structural changes; etc. But I really don't even know enough to be able to separate insane ideas from plausible ones, much less good ones. What do you think?
It depends on the report obviously and how it would change but following what you say it does seem to me you can write a good and meaningful SAS Macro program that can have as parameters all your selection criteria. In the SAS macro code you can then evaluate the parameters and make the structural change, if necessary.
So one .sas file with just one big macro in it, depending on the parameters you use to call the macro it can reproduce all the reports you want.
This makes sense to you? If it doesn't let me know and I could provide some examples of SAS Macro to get you started if you are not familiar with it.
I'd personally go for your first suggestion:
make a branch for each report, but only merge structural changes back onto the master
This is by far the easiest conceptually, and it by merging the structural changes into the head revision, you can apply them as and when required to the other branches (when requested). The only downside is the amount of branches you'll leave lying around, it sounds like an infrequent request and a good naming scheme should sort that out.
I have a particular report that I am asked to run from time to time. The details are slightly different each time - different date ranges, different selection criteria - but structurally, the report is fairly stable.
If you can anticipate which fields change each time, I would say make a generic report that prompts you for this data each time the report is run. You should be able to do this in just about any reporting software. The report itself can be tracked in git, and you won't have to worry about having 50,000 branches in your repository.
If it's unpredictable what fields need to be custom each time, give most of the fields useful default values.
If you run this report a lot, and are specifically interested in keeping track of the various result sets, I'd suggest a different approach. I don't know what your report generates, but let's say it's a PDF. I would make a directory structure somewhere, and you could store each run in results/year/month/date.pdf
. This way you will have a record of the data pulled on May 5, 2010 (or with May 5, 2010 as a parameter).
Edit: You might consider tags instead of branches for those things you can't combine into a single report. If you have a version you think you're going to need quick access to, tag it. Any time you need to get back to it, just check out the tag and run the report.
精彩评论