Change Website Character encoding from iso-8859-1 to UTF-8

2022-12-09 04:37 问答作者：

About 2 years ago I made the mistake of starting a large website using iso-8859-1. I now am having issues with some characters, especially when sending data to the se开发者_Python百科rver using ajax. Because of this, I would like to switch to using UTF-8.

What issues do you see coming from this? I know I would have to search the site to look for characters that need to be changed from a ? to their real characters. But, are there any other risks in doing this? Has anyone done this before?

The main difficulty is making sure you've checked that all the data paths are UTF-8 clean:

Is your site DB-backed? If so, you'll need to convert all the tables to UTF-8 or some other Unicode encoding, so sorting and text searching work correctly.
Is your site using some programming language for dynamic content? (PHP, mod_perl, ASP...?) If so, you'll have to make sure the particular language interpreter you're using fully understands some form of Unicode, work out the conversions if it isn't using UTF-8 natively — UTF-16 is next most common — and check that it's configured to use UTF-8 on its output to the web server.
Does your site have some kind of back-end app server? Does it use UTF-8 for its text outputs?
There are at least three different places you can declare the charset for a web document. Be sure you change them all:
- the HTTP Content-Type header
- the <meta http-equiv="Content-Type"> tag in your documents' <head>
- the <?xml> tag at the top of the document, if using XHTML Strict

All this comes from my experiences a years ago when I traced some Unicode data through a moderately complex N-tier app, and found conversion chains like:

Latin-1 → UTF-8 → Latin-1 → UTF-8

So, even though the data ended up in the browser claiming to be "UTF-8", the app could still only handle the subset common with Latin-1.

The biggest reason for those odd conversion chains was due to immature Unicode support in the tooling at the time, but you can still find yourself messing with ugliness like this if you're not careful to make the pipeline UTF-8 clean.

As for your comments about searching out Latin-1 characters and converting files one by one, I wouldn't do that. I'd build a script around the iconv utility found on every modern Linux system, feeding in every text file in your system, explicitly converting it from Latin-1 to UTF-8. Leave no stone unturned.

Such a change touches (nearly) every part of your system. You need to go through everything, from the database to the PHP to the HTML to the web browser.

Start a test site and subject it to some serious testing (various browsers on various platforms doing various things).

IMO it's important to actually get familiar with UTF-8 and what it means for software. A few quick points:

PHP is mostly byte-oriented. Learn the difference between characters and code points and bytes, and between UTF-8 and Unicode.
UTF-8 is well-designed. For instance, given two UTF-8 strings, a byte-oriented strstr() will still function correctly.
The most common problem is treating a UTF-8 string as ISO-8859-1 and vice versa - you may need to add documentation to your functions stating what kind of encoding they expect, to make these sorts of errors less likely. A variable naming convention for your strings (to indicate what encoding they use) may also help.

继续阅读：encoding iso-8859-1 php utf-8

Change Website Character encoding from iso-8859-1 to UTF-8

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？