开发者

Git says "Binary files a... and b... differ" on for *.reg files

Is there a way to force Git in to treating .reg files as text? I am using Git to track my windows registry tweaks and Windows uses .reg for these files.

UPDATE 1: I got it to run a diff (thanks, Andrew). However, now it looks like this below. Is this an encoding issue?

index 0080fe3..fc51807 100644
--- a/Install On Rebuild/4. Registry Tweaks.reg
+++ b/Install On Rebuild/4. Registry Tweaks.reg
@@ -开发者_开发知识库1,49 +1,48 @@
-<FF><FE>W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ^@E^@d^@i^@t^@o^@r^@
-^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;^@;
-^@^M^@
...

Any ideas?

UPDATE 2: Thanks all who helped: here's what I did in the end: create file .gitattributes with content *.reg text diff and then I converted the files to UTF-8 as UTF-16 is weird with diffs. I'm not using any foreign characters so UTF-8 works for me.


To tell git to explicitly diff a filetype, put the following in a .gitattributes file in your repository’s root directory:

*.reg diff


Quick Answer

As others have pointed out, this issue is caused by an encoding mix up. You have two options:

  • Change the file encoding to UTF-8 by re-saving it accordingly.

  • Create a .gitattributes file, and include the following:

    *.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF

Cause

By default, registry exports from the Windows Registry Editor are saved in a particular UTF-16 encoding. Under the hood, Git only supports UTF-8 and its supersets, so when Git sees a UTF-16 encoded file, it sees a lot of unexpected non-character bytes and interprets that as a binary file.

Asking Git to treat the file as text by setting a *.reg diff attribute doesn't work because Git is still expecting the wrong encoding. That's why you saw all of those ^@ characters.

Solutions

One solution that others have suggested is to save the UTF-16 files as UTF-8 and that totally works! It does have one big disadvantage though: if you have a lot of .reg files, or you want to re-export a key from the Registry Editor, you'll have to re-save it with the correct encoding every time.

Alternatively, you can tell Git what encoding you plan to use with the working-tree-encoding attribute. When this is specified, Git will convert a text file to UTF-8 as it is committed to the repository, and then convert it back to the original encoding as it gets checked out. That way, the file always has the original encoding when it appears in your working directory. If you're familiar with end-of-line normalization, the behavior is similar to that.

If you take this route, there are a few pitfalls to be aware of:

  1. The attribute is relatively new (March 2018), so if you're supporting wide Git implementations or versions, it could cause trouble.
  2. If you're going beyond small UTF-16 files, encoding conversion could slow things down or, depending on the encoding, not make the round-trip unscathed.

For these reasons, the documentation recommends to only use this attribute if the file cannot be stored usefully as UTF-8, but depending on your use case these pitfalls may not concern you. Finally, when using this attribute it's important to also specify what end-of-line characters are in use to avoid ambiguity. That's done with the eol attribute.

Putting it all together, I recommend you try creating a .gitattributes file in your repository's root, and including the following line:

*.reg working-tree-encoding=UTF-16LE-BOM eol=CRLF


Git is treating your registry export files as binary files because they have NULs. There is no good way to diff or merge general binary files. A change of one byte can change the interpretation of the rest of the file.

There are two general approaches to handling binary files:

  1. Accept that they're binary. Diffs aren't going to be meaningful, so don't ask for them. Don't ever merge them, which means only allowing changes on one branch. In this case, this can be made easier by putting each tweak (or set of related tweaks in a separate file, so there's fewer possible ways differences will happen in one file.

  2. Store the changes as text, and convert/deconvert to these binary forms.

Even though these "text" files, the UTF-16 encoding contains NULs. There appear to be no non-ASCII bits however. Can you convert them to ASCII (or UTF-8, which will be ASCII if there are no extended characters)?


Convert .reg files from utf16 to utf8 by opening each .reg file in notepad and saving as Encoding UTF-8.


Create one utf16toascii.py:

#!/usr/bin/env python3
import sys
data = open(sys.argv[-1]).read()
ascii = data.decode('utf-16').encode('ascii', 'replace')
sys.stdout.write(ascii)

Then in bash do:

$ echo "*.reg diff=utf16strings" >> .gitattributes
$ git config --global diff.utf16strings.textconv /path/to/utf16toascii.py

And you're good to diff registry files, as well as Xcode .strings files, or any other utf-16 file.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜