How can I tell TortoiseHg to display a UTF-16 file as non-binary?

2023-03-17 17:15 问答作者：

In a Microsoft Access 2007 project the Access form objects are exported to files with a dedicated software by using the built-in function "SaveAsText". This is necessary because Access doesn't store any of it's code modules in isolated files at its own.

The file starts with the bytes "FF FE" (which is UTF-16 according to http://de.wikipedia.org/wiki/B开发者_Python百科yte_Order_Mark). I presume because of many NUL characters in this file, Hg treats this file as a binary file. Hence the diff pane in the TortoiseHG workbench always tells

File or diffs not displayed: File is binary.

which is quite understandable under this assumption. But nevertheless this file is just usual source code. I can view it for example in Windows' notepad without any problems.

Is there any way to tell Mercurial, that this particular file should be treated as text, not binary?

Edit: Additionally to the marked preferred answer below I decided not to change the saving behaviour, but to use the "Visual Diff" command (select file, then press Ctrl+d) instead.

I'm guessing that you frequently or occasionally export the form objects in order to track source code changes.

The only way to convince Mercurial that a file is not binary is to avoid NUL bytes.

You may want to convert the source code files to ASCII (or maybe ANSI) encoding as an additional step in your export in order to avoid the NUL bytes. If the source code files contain Unicode characters, you might try UTF-8, as this will only do multi-byte characters when necessary and single-byte characters otherwise, thus avoiding NUL bytes again. I tried it out briefly and Mercurial handles UTF-8: it doesn't show "File is binary", but the actual diff. I committed on the commandline, but viewed the diff in TortoiseHg. I have a link about commandline encoding challenges below.

The hgrc encode/decode sections might be particularly useful in helping to filter the UTF-16 files into something that works better.

A couple other pages on Mercurial and encoding:

Character Encoding On Windows
Encoding Strategy

TortoiseHg 2.1 + Mercurial 1.9

From https://www.mercurial-scm.org/wiki/BinaryFiles:

The question naturally arises, what is a binary file anyway? It turns out there's really no good answer to this question, so Mercurial uses the same heuristic that programs like diff(1) use. The test is simply if there are any NUL bytes in a file.

For diff, export, and annotate, this will get things right almost all of the time and it will not attempt to process files it thinks are binary. If necessary, you can force these commands to treat files as text with -a.

This didn't exist at the time the question was asked, but now there's the msaccess-vcs-integration project, which exports/imports MS Access objects so that they can be version controlled.

Quote from the project's readme:

Encoding

For Access objects which are normally exported in UCS-2-little-endian encoding , the included module automatically converts to the source code to and from UTF-8 encoding during export/import; this is to ensure that you don't have trouble branching, merging, and comparing in tools such as Mercurial which treat any file containing 0x00 bytes as a non-diffable binary file.

If you export your forms and modules with this instead of directly using Access's SaveAsText function, Mercurial will not treat the files as binary.

继续阅读：mercurial ms-access tortoisehg unicode

How can I tell TortoiseHg to display a UTF-16 file as non-binary?

Encoding

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？