To hook or not to hook - git
Our bespoke IDE outputs XML files with an encoding that makes them look like binary files. Diffs and merges of these files fail.
We can create ASCII versions of these files with the tr
command. I would like to get to a state where these files are always automatically converted to ascii before they are com开发者_如何学Gomitted.
I picked up my copy of Version Control with Git and it wholeheartedly warns me away from using hooks unless I really need to.
Should I be using a hook for this purpose? Or can I do something else to ensure the files are always converted before commit?
Windows XP with msysgit 1.7.4
--= update =--
Thanks everyone for your help and patience. Looking to this question I tried the following, but it does not work:
echo "*.xrp filter=xrp" > .git/info/attributes
git config --global filter.xrp.clean 'tr -cd '\''\11\12\15\40-\176'\'''
git config --global filter.xrp.smudge cat
git checkout --force
The files remain unchanged after this config change. Even when I delete and re-checkout.
The tr
command configured as the clean task does work in isolation. Proof:
$ head -n 1 cashflow/repo/C_GMM_CashflowRepo.xrp
ÿþ< ! - - X M L R e p o s i t o r y f i l e 1 . 0 - - >
$ tr -cd '\''\11\12\15\40-\176'\' < cashflow/repo/C_GMM_CashflowRepo.xrp | head -n 1
<!-- XML Repository file 1.0 -->
Can anyone see what is wrong with my config?
One issue with hooks is that they aren't distributed.
.gitattributes
has some directive to manage the diff and content of a file, but another option would be an attribute filter (still in .gitattributes
), and could automatically convert those files on commit.
(That is if the clean script is able to detect those files based on their content alone)
Per this chat discussion, the OP Synesso reports a success:
.gitattributes:
*.xrp filter=xrp
~/.gitconfig:
[filter "xrp"]
clean = \"C:/Program Files/Git/bin/tr.exe\" -cd "\\''\\11\\12\\15\\40-\\176'\\'"
smudge = cat
Then I had to modify the file, add, commit, delete, checkout ... and THEN it was fixed. :)
Note that, for any modification which doesn't concern just one user, but potentially any user cloning that repo, I prefer adding (and committing) an extra .gitattributes
file in which the filter is declared, rather than modifying the .git/info/attribute
file (which isn't cloned around).
From the gitattributes
man page:
- If you wish to affect only a single repository (i.e., to assign attributes to files that are particular to one user’s workflow for that repository), then attributes should be placed in the
$GIT_DIR/info/attributes
file.- Attributes which should be version-controlled and distributed to other repositories (i.e., attributes of interest to all users) should go into
.gitattributes
files.- Attributes that should affect all repositories for a single user should be placed in a file specified by the
core.attributesfile
configuration option.- Attributes for all users on a system should be placed in the $(prefix)/etc/gitattributes file.
http://git-scm.com/docs/gitattributes
phyatt adds in the comments:
I made an example similar to this for sqlite3.
You can add it into the correct files with two lines:
git config diff.sqlite3.textconv 'sqlite3 $1 .dump'
echo '*.db diff=sqlite3' >> $(git rev-parse --show-toplevel)/.gitattributes
Similar lines can be used for writing other git config paths.
Does diff stand a chance of working on them as is (i.e. they just contain a handful of strange bytes but are otherwise text) or not? If it does, you can just force git to treat them as text with .gitattributes
. If not, it still might be better to create custom diff and merge scripts (that will use the tr as needed to convert) and tell git to use it, again with .gitattributes
.
In either case you will not be using hooks (those are for running in particular operations), but .gitattributes
, which are file-specific.
If your preferred editing format were ASCII and only your builds required the binary files I would recommend using build rules to generate the binary version from the preferred source which you would commit to the repository.
Given that your IDE makes the files in the binary format already, I think the best thing is to store them in the repository in that format.
Rather than hooks, look at git help attributes
, especially diff
and textconv
which allow you to configure files matching certain patterns to use alternate means of diffing. You should be able to produce working ASCII diffs without having to compromise how you store the files or edit them.
EDIT: Based on your comment elsewhere that "every other byte is 0" that suggest the file is UTF-16 or UCS-2. See this answer for a diff
which can handle unicode: Can I make git recognize a UTF-16 file as text?
精彩评论