开发者

Migrate web-pages from different char-sets to UTF-8

For the last years I used Notepad++ on Win XP SP2. As I just have seen, the setting in Notepad++ is to encode new files in "ANSI" in "Windows Format". Basically all files on my harddisk should be ANSI files then, but I'm not sure. Most .html-files have a charset-tag as "text/html; charset=iso-8859-1", but some have none. Other files, especially text-files (for example keyword-lists) I stored with Firefox XPCOM-system, I don't know how they are currently encoded.

On Server-side I have Apache with PHP and MySql. For Upload I used Filezilla.

Now the problem is: I want to use Japanes signs (or arabic, etc.). This only works partly. I can get my selfmade Firefox-Application to constantly write or read UTF-8. But I can't check everytime which of the old files is which encoding.

Having just read Joel Spolsky's old article about UTF-8 strengthens my view that I simply have to get my whole system changed as much as possible to UTF-8. As long as I have it running that way locally on my Hard-Disk I could just re-upload everything to the server.

So: How do I get all my files locally transfered to UTF-8? And: Is it possible at all to have Win XP SP2 using constantly UTF-8 everywhere? Or do I have to check it with every 开发者_运维知识库program, or even worse with every file, that the right encoding is to be used. How about files I get for example in E-Mails or via an USB-stick, or that I download in zip-files? (Or a thousand possibilities more.)

Update:

1.-4. went OK so far. I tried first with BOM, but without seems to be better.

So to 5.) Something I have to change there too. I changed as in 3.) the charset in the html-template-file, and the text coming from the template is displayed correctly. But the text coming from MySql/Php shows the UnknownChar-sign at some places currently, i.e. where there should be Umlaute äöü. I have changed all collations for text fields in the MySql-Database via phpmyadmin to "utf8_unicode_ci", but that didn't do the trick. Is it a php-issue, or do I only have to convert somehow the data in the MySql-Database once?


  1. The beauty of UTF-8 is that it's a superset to ASCII, so if your html and php files only contain Latin alphabets (i.e. English and programing/HTML syntax), you don't need to convert the file at all. You can leave most of your file unchanged.
  2. Should you find few exceptions that you want to convert it manually, you may open them up in Notepad++, and do 'Encoding' - 'Convert to UTF-8 (No BOM)'.
  3. Yes, you do need to change/add <meta> charset tag to all the HTML files to make sure the browser render your files in UTF-8.
  4. In Notepad++ you could set the new file to always open with 'UTF-8 (No BOM), Unix'. Also, check the tick on "Apply to ANSI files" so old file can be correctly saved to the new encoding. I suggest the format is because even though you are working on a Windows machine, the web servers usually runs Linux/BSD so the format is the native form (keeping files in native form is important especially when you are using a version control system).
  5. Migrate a live site with database is a different issue. Data in MySQL comes with their own encoding, and from your question I cannot tell if you need to do it and how to do it. Need more specifics on that (if you need to).
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜