开发者

Convert double-byte numbers and spaces in filenames to ASCII

Given a directory of filenames consisting of double-byte/full-width numbers and spaces (along with some half-width numbers and underscores), how can I convert all of the numbers and spaces to single-byte characters?

For example, this filename consists of a double-byte number, followed by a double-byte space, followed by some single-byte characters:

2 2_3.ext

and I'd like to change it to all single-byte like so:

2 2_3.ext

I've tried convmv to convert from utf8 to ascii, but the following message appears for all files:

"ascii doesn't cover all need开发者_运维百科ed characters for: filename"


You need either (1) normalization from Java 1.6 (java.text.Normalizer), or (2) ICU, or (3 (unlikely)) a product sold by the place I work.


What tools do you have available? There are Unicode normalisation functions in several scripting languages, for example in Python:

for child in os.listdir(u'.'):
    normal= unicodedata.normalize('NFKC', child)
    if normal!=child:
        os.rename(child, normal)


Thanks for your quick replies, bmargulies and bobince. I found a Perl module, Unicode::Japanese, that helped get the job done. Here is a bash script I made (with help from this example) to convert filenames in the current directory from full-width to half-width characters:

#!/bin/bash
for file in *;do
newfile=$(echo $file | perl -MUnicode::Japanese -e'print Unicode::Japanese->new(<>)->z2h->get;')
test "$file" != "$newfile" && mv "$file" "$newfile"
done
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜