Convert double-byte numbers and spaces in filenames to ASCII
Given a directory of filenames consisting of double-byte/full-width numbers and spaces (along with some half-width numbers and underscores), how can I convert all of the numbers and spaces to single-byte characters?
For example, this filename consists of a double-byte number, followed by a double-byte space, followed by some single-byte characters:
2 2_3.ext
and I'd like to change it to all single-byte like so:
2 2_3.ext
I've tried convmv to convert from utf8 to ascii, but the following message appears for all files:
"ascii doesn't cover all need开发者_运维百科ed characters for: filename"
You need either (1) normalization from Java 1.6 (java.text.Normalizer
), or (2) ICU, or (3 (unlikely)) a product sold by the place I work.
What tools do you have available? There are Unicode normalisation functions in several scripting languages, for example in Python:
for child in os.listdir(u'.'):
normal= unicodedata.normalize('NFKC', child)
if normal!=child:
os.rename(child, normal)
Thanks for your quick replies, bmargulies and bobince. I found a Perl module, Unicode::Japanese, that helped get the job done. Here is a bash script I made (with help from this example) to convert filenames in the current directory from full-width to half-width characters:
#!/bin/bash
for file in *;do
newfile=$(echo $file | perl -MUnicode::Japanese -e'print Unicode::Japanese->new(<>)->z2h->get;')
test "$file" != "$newfile" && mv "$file" "$newfile"
done
精彩评论