How to make a pre-commit hook that prevents non-UTF-8 file encodings
Is it possible to make a precommit hook for git or svn that can reject files not committed in a specific encodi开发者_运维知识库ng?
I have worked on several project where it seems to be a problem to stick to a certain file encoding (like UTF-8 for instance)
Your iconv may be able to tell you if something is not UTF-8, but other encodings may not be so easy (especially 8-bit, single byte encodings like ISO-8859-1).
For Git, you may actually want an update hook instead of a pre-commit hook (so that it can be run in a central repository to enforce the rule).
Git pre-commit hook:
#!/bin/sh
git ls-files -z -- |
xargs -0 sh -c '
e=""
for f; do
if ! git show :"$f" |
iconv -f UTF-8 -t UTF-8 >/dev/null 2>&1; then
e=1
echo "Not UTF-8: $f"
#exit 255 # to abort after first non-UTF-8 file
fi
done
test -z "$e"
' -
Put one or more Git pathspecs after the --
on the git ls-files command line to limit the pathnames that are checked.
To check the tip of the updated ref in an update hook, use git ls-tree --name-only -r -z $3 -- |
to generate the pathnames (note: it does not handle pattern pathspecs like git ls-files, so do any pattern-based filtering in the shell code) and git show "$3:$f"
to extract the file contents. You might also want to check not only the tip commit, but each new commit (loop for each commit in git rev-list ^$2 $3
instead of just $3
).
Precommit hooks are just scripts. So if you can tell the encoding in a script, then you can use that information to reject the wrong sort of file.
You could search the file for characters outside of the normal character range. If there's a magic number or a tag to tell you the encoding for a file, you can check that. Otherwise ask yourself "how would I know this file is in the wrong encoding?" Can you code that up?
You could maybe use iconv utility to change the encoding from UTF-8 to for example UTF-16. And if the change fails, the source file is not in correct encoding:
$ iconv -f UTF-8 -t UTF-16 Strings.java
ÿþ
testing = iconv: illegal input sequence at position 11
精彩评论