PHP string comparison for strings which look the same in some views, but not in others

2023-03-19 10:34 问答作者：

I have two strings that look the same when I echo them, but when I var_dump() them they are different string types:

Echo:

http://blah
http:/开发者_运维知识库/blah

var dump:

string(14) "http://blah"
string(11) "http://blah"

strToHex:

%68%74%74%70%3a%2f%2f%62%6c%61%68%00%00%00
%68%74%74%70%3a%2f%2f%62%6c%61%68

When I compare them, they return false. How can I manipulate the string type, so that I can perform a comparison that returns true?

What is the difference between string 11 and string 14? I am sure there is a simple resolution, but I have not found anything yet. No matter how I implode, explode, UTF-8 encode, etc., they will not compare the strings or change type.

Letter "a" can be written in another encoding.

For example: blаh. Here a is a Cyrillic 'а'.

All of these letters are Cyrillic, but it looks like Latin: у, е, х, а, р, о, с

Trim the strings before comparing. There are escaped characters, like \t and \n, which are not visible.

$clean_str = trim($str);

When using var_dump(), then string(14) means that the value is a string that holds 14 bytes. So string(11) and string(14) are not different "types" of strings; they are just strings of different length.

I would use something like this to see what actually is inside those strings:

function strToHex($value, $prefix = '') {
    $result = '';
    $length = strlen($value);
    for ( $n = 0; $n < $length; $n++ ) {
        $result .= $prefix . sprintf('%02x', ord($value[$n]));
    }
    return $result;
}

echo strToHex("test\r\n", '%');

Output:

%74%65%73%74%0d%0a

This decodes as:

%74 - t
%65 - e
%73 - s
%74 - t
%0d - \r (carriage return)
%0a - \n (line feed)

Or, as pointed out in comments by Karolis, you can use the built-in function bin2hex():

echo bin2hex("test\r\n");

Output:

746573740d0a

Try to trim these strings:

if (trim($string1) == trim($string2)) {
  // Do things
}

Probably Unicode strings within the upper range are counted as double bytes.

Use mb_strlen() to check lengths.

Also some characters may not be visible, but present (there are many of Unicode spaces, etc.)

Generally, when you work with Unicode functions, you should use the mb_* string functions.

You may overload string encoding functions in php.ini to always use mb_* functions instead the standard ones (I am not sure if Xdebug honors those settings).

In PHP 6 this problem will be solved, as it should be globally Unicode-aware.

继续阅读：comparison php string types

PHP string comparison for strings which look the same in some views, but not in others

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？