PHP iconv_strlen() meaning question

2023-03-07 07:58 问答作者：

I was wondering what does the following sentence mean in si开发者_JAVA技巧mple terms for us dummies?

And what is byte sequence? And how many characters in a byte?

iconv_strlen() counts the occurrences of characters in the given byte sequence str on the basis of the specified character set, the result of which is not necessarily identical to the length of the string in byte.

Let's take for example the Japanese character 'こ'. Assuming UTF-8 encoding, this is a 3 byte character (0xE3 0x81 0x93). Let's see what happens when we use strlen instead:

$ php -r 'echo strlen("こ") . "\n";'
3

The result is 3, since strlen is counting bytes. However, this is only a single character according to UTF-8 encoding. That's where iconv_strlen comes in. It knows that in UTF-8, this is a single character, even though it's made up of 3 bytes. So if we try this instead:

$ php -r 'echo iconv_strlen("こ", "UTF-8") . "\n";'
1

We get 1. That's what that explanation is meant to point out.

"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

A string has a particular length in bytes. The number of characters in that string will be equal to the number of bytes if and only if each character in the string is represented by a single byte. This is true, for example, for English letters. For representations (i.e., encodings) that use more than one byte to represent some or all characters, the number of characters will be less than the number of bytes*. It is not possible, for example, to represent all possible Chinese characters with a byte.

So, iconv_strlen, given an encoding, will try to count the number of characters in the string. The byte sequence is the order of bytes in the string. For a string containing Chinese, using UTF8 encoding, you might, for example, have a 20-byte string that has 14 characters.

*It could be more, if a character is represented by less than one byte.

iconv_strlen() counts the occurrences of characters in the given byte sequence str on the basis of the specified character set, the result of which is not necessarily identical to the length of the string in byte.

Translations:

byte sequence: another word for string, which is a sequence of bytes (1 byte = 8 bits), e.g.: 01011010 00011001 01101011. Byte sequences represent characters like A, B, C etc.
character set: a.k.a. encoding, specifies how a byte maps to a character; e.g. 01000001 represents A in the ASCII character set.
not necessarily identical to the length […] in byte: in the ASCII character set, one byte represents exactly one character. This is not the case for all character sets; in some two, three or more bytes are used to represent one character. That is because one byte can only hold 256 different values and some languages are written using more than 256 characters (like Chinese and Japanese). Unicode even attempts to map all characters of all human languages in a single character set, which requires a lot more than one byte per character.

In summary:

iconv_strlen() counts the characters in the given string, taking into account the character set. Therefore, the number of characters may not be equal to the number of bytes.

继续阅读：iconv php

PHP iconv_strlen() meaning question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？