Unicode PHP source files

2023-02-25 17:47 问答作者：

For a project I'm currently working on I needed to add some unicode characters to some php file.

So I ne开发者_Go百科eded to use unicode encoding of course.

That made me wonder:

What prevents me of using unicode for all my PHP files?

Nothing prevents you using unicode in all your php files, only if you do you may need to edit your scripts if the unicode setting that is set interferes with the script processing.

There are some things to remember when you work with UTF8 encoded source files:

Some editors may add BOM in the beginning of the files - this may damage the script output - you should save you files without BOM.
strlen and other string functions may work not as you expecting - you should use multibyte string functions for string length, etc: http://php.net/manual/en/book.mbstring.php
regex requires u modifier to work with unicode characters.
you should be careful when you work with files - pay attention to the current encoding, because when the file does not contain BOM (see #1) editor may open it in system default encoding.
some source code tools may do not work correctly with UTF8 files (because they do not contain BOM, but some of them work incorrectly even when the files have it).

From my experience, I can say that it is better sometimes to store strings in resources (text files or so) and do not use UTF8 in code files, but sometimes it is ok - this depends on whether you have problems with it or not.

What's “Unicode encoding”?

Unicode is a character set; there are lots of encodings between Unicode and bytes, many of them mapping only a subset of possible characters.

When you want to use non-ASCII Unicode characters in a PHP script, the usual best choice of encoding is UTF-8, as it's an ASCII-superset encoding (ie. the lower 128 values of each byte always mean the standard ASCII characters) that can still represent any Unicode character. PHP, like many other byte-oriented tools, can only reliably work with ASCII-superset encodings.

If by “Unicode encoding” you mean the thing that Notepad and other Windows tools call “Unicode”, that's quite a different proposition. This is a misleading name for what is correctly known as the UTF-16LE encoding. This encoding has a two-byte-per-code-unit width, which means eg that normal ASCII characters come out with zero bytes between them. It's not an ASCII-superset, so PHP and other byte-based tools can't do much with it directly.

When saving scripts in Windows-based editors, look to save in UTF-8 (without BOM), and serve your pages with a UTF-8 Content-Type charset. Although it's the default in-memory representation for Windows, Java and JavaScript, UTF-16LE is of pretty much zero use for storing files or serving web pages.

What prevents me of using Unicode for all my PHP files?

The specific encoding might. PHP itself does not treat the file-input specifically but only as a binary sequence.

The only Unicode encoding that is compatible with PHP on the source-file level is UTF-8.

Take care to not save the php-files with the UTF-8-BOM. PHP Does treat it as a standard text and outputs it because it is before the opening <?php tag:

{UTF8-BOM}<?php

The output is invisible but has a byte-length of three causing either headers already sent errors or inserting text-nodes inside the DOM where those are not expected.

继续阅读：php unicode utf-8

Unicode PHP source files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？