开发者

Batch convert on Mac OS X html files to UTF-8 with Unix (LF)

I am on a Mac OS X with Snow Leopard.

I need to batch convert a lot of .htm files 开发者_开发技巧that were originally created on Windows to UTF-8 with Unix (LF) line breaks.

I can batch rename all of the files .html with NameMangler.

I can do a search/replace of all of the files to update all hyperlinks to reflect the extension change to .html using TexFinderX.

Now the last step is to do a batch convert to UTF-8 and with Unix (LF) line breaks.

Does anyone know of an app that can do this? I hope that I don't have to manually open each of the files in a text editor and save each one individually. I am afraid that I might accidentally miss some of the files…and it would take a long time to do this.

TIA, Linda


You'll want to check out this dos2unix port for MacOS. Haven't used it myself since I don't own a Mac, but dos2unix is the general unix utility for conversion of windows files to unix files.


This was on Linux, but it should work on Mac OS. You may have to check the options to the find command, which may be slightly different on Mac OS. If you can't find recode for Mac, you can probably find iconv, and adapt the options. I actually just used this crazy not-really-oneliner on 2400+ files, of which 1400+ were converted:

 find . -regextype posix-awk -iregex ".*\.(txt|htm|html|cgi|php|pl|pm)" | while read f; do t=`mktemp "$f.utf8.XXXXX"`; if [ ! "$?" = 0 ]; then echo "ERROR: cannot make temp file for $f"; continue; fi; echo recoding $f to $t; if cat "$f" | recode cp1252/..utf8/ >"$t" ; then if diff -wq "$f" "$t"; then echo No change: $f; rm "$t"; else mv "$f" "$f.cp1252" && mv "$t" "$f" && echo OK $f; fi; else echo "ERROR: $?"; fi; done | tee -a convert-results.txt

Here is the same on several lines to make it slightly more readable:

find . -regextype posix-awk -iregex ".*\.(txt|htm|html|cgi|php|pl|pm)" | \
  while read f; do
    t=`mktemp "$f.utf8.XXXXX"`
    if [ ! "$?" = 0 ]; then
      echo "ERROR: cannot make temp file for $f"
      continue
    fi
    echo recoding $f to $t
    if cat "$f" | recode cp1252/..utf8/ >"$t" ; then
      if diff -wq "$f" "$t"; then
        echo No change: $f
        rm "$t"
      else
        mv "$f" "$f.cp1252" && mv "$t" "$f" && echo OK $f
      fi
    else
      echo "ERROR: $?"
    fi
  done \
| tee -a convert-results.txt

I used cp1252/..utf8/ because I already had the line endings in LF and wanted to keep them that way. You may need to adapt that to your files, and read the recode man page. Or maybe iconv also handles line-endings?

Of course, backup the whole directory tree before executing any such command!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜