Batch convert on Mac OS X html files to UTF-8 with Unix (LF)
I am on a Mac OS X with Snow Leopard.
I need to batch convert a lot of .htm files 开发者_开发技巧that were originally created on Windows to UTF-8 with Unix (LF) line breaks.
I can batch rename all of the files .html with NameMangler.
I can do a search/replace of all of the files to update all hyperlinks to reflect the extension change to .html using TexFinderX.
Now the last step is to do a batch convert to UTF-8 and with Unix (LF) line breaks.
Does anyone know of an app that can do this? I hope that I don't have to manually open each of the files in a text editor and save each one individually. I am afraid that I might accidentally miss some of the files…and it would take a long time to do this.
TIA, Linda
You'll want to check out this dos2unix
port for MacOS. Haven't used it myself since I don't own a Mac, but dos2unix
is the general unix utility for conversion of windows files to unix files.
This was on Linux, but it should work on Mac OS. You may have to check the options to the find
command, which may be slightly different on Mac OS. If you can't find recode
for Mac, you can probably find iconv
, and adapt the options. I actually just used this crazy not-really-oneliner on 2400+ files, of which 1400+ were converted:
find . -regextype posix-awk -iregex ".*\.(txt|htm|html|cgi|php|pl|pm)" | while read f; do t=`mktemp "$f.utf8.XXXXX"`; if [ ! "$?" = 0 ]; then echo "ERROR: cannot make temp file for $f"; continue; fi; echo recoding $f to $t; if cat "$f" | recode cp1252/..utf8/ >"$t" ; then if diff -wq "$f" "$t"; then echo No change: $f; rm "$t"; else mv "$f" "$f.cp1252" && mv "$t" "$f" && echo OK $f; fi; else echo "ERROR: $?"; fi; done | tee -a convert-results.txt
Here is the same on several lines to make it slightly more readable:
find . -regextype posix-awk -iregex ".*\.(txt|htm|html|cgi|php|pl|pm)" | \
while read f; do
t=`mktemp "$f.utf8.XXXXX"`
if [ ! "$?" = 0 ]; then
echo "ERROR: cannot make temp file for $f"
continue
fi
echo recoding $f to $t
if cat "$f" | recode cp1252/..utf8/ >"$t" ; then
if diff -wq "$f" "$t"; then
echo No change: $f
rm "$t"
else
mv "$f" "$f.cp1252" && mv "$t" "$f" && echo OK $f
fi
else
echo "ERROR: $?"
fi
done \
| tee -a convert-results.txt
I used cp1252/..utf8/
because I already had the line endings in LF and wanted to keep them that way. You may need to adapt that to your files, and read the recode man page. Or maybe iconv also handles line-endings?
Of course, backup the whole directory tree before executing any such command!
精彩评论