Highlighting and replacing non-printable unicode characters in Emacs
I have an UTF-8 file containing some Unicode characters like LEFT-TO-RIGHT OVERRIDE (U+202D) which I w开发者_如何学编程ant to remove from the file. In Emacs, they are hidden (which should be the correct behavior?) by default. How do I make such "exotic" unicode characters visible (while not changing display of "regular" unicode characters like german umlauts)? And how do I replace them afterwards (with replace-string
for example. C-X 8 Ret
does not work for isearch/replace-string
).
In Vim, its quite easy: These characters are displayed with their hex representation per default (is this a bug or missing feature?) and you can easily remove them with :%s/\%u202d//g
for example. This should be possible with Emacs?
You can do M-x find-file-literally
then you will see these characters.
Then you can remove them using usual string-replace
How about this:
Put the U+202d character you want to match at the top of the kill ring by typing M-:(kill-new "\u202d")
. Then you can yank that string into the various searching commands, with either C-y (eg. query-replace
) or M-y (eg. isearch-forward
).
(Edited to add:)
You could also just call commands non-interactively, which doesn't present the same keyboard-input difficulties as the interactive calls. For example, type M-: and then:
(replace-string "\u202d" "")
This is somewhat similar to your Vim version. One difference is that it only performs replacements from the cursor position to the bottom of the file (or narrowed region), so you'd need to go to the top of the file (or narrowed region) prior to running the command to replace all matches.
I also have this issue, and this is particularly annoying for commits as it may be too late to fix the log message when one notices the mistake. So I've modified the function I use when I type C-x C-c
to check whether there is a non-printable character, i.e. matching "[^\n[:print:]]"
, and if there is one, put the cursor over it, output a message, and do not kill the buffer. Then it is possible to manually remove the character, replace it by a printable one, or whatever, depending on the context.
The code to use for the detection (and positioning the cursor after the non-printable character) is:
(progn
(goto-char (point-min))
(re-search-forward "[^\n[:print:]]" nil t))
Notes:
- There is no need to save the current cursor position since here, either the buffer will be killed or the cursor will be put over the non-printable character on purpose.
- You may want to slightly modify the regexp. For instance, the tab character is a non-printable character and I regard it as such, but you may also want to accept it.
- About the
[:print:]
character class in the regexp, you are dependent on the C library. Some printable characters may be regarded as non-printable, like some recent emojis (but not everyone cares). - The
re-search-forward
return value will be regarded as true if and only if there is a non-printable character. This is exactly what we want.
Here's a snippet of what I use for Subversion commits (this is between more complex code in my .emacs
).
(defvar my-svn-commit-frx "/svn-commit\\.\\([0-9]+\\.\\)?tmp\\'")
and
((and (buffer-file-name)
(string-match my-svn-commit-frx (buffer-file-name))
(progn
(goto-char (point-min))
(re-search-forward "[^\n[:print:]]" nil t)))
(backward-char)
(message "The buffer contains a non-printable character."))
in a cond
, i.e. I apply this rule only on filenames used for Subversion commits. The (backward-char)
can be used or not, depending on whether you want the cursor to be over or just after the non-printable character.
精彩评论