开发者

PHP string comparison for strings which look the same in some views, but not in others

I have two strings that look the same when I echo them, but when I var_dump() them they are different string types:

Echo:

http://blah
http:/开发者_运维知识库/blah

var dump:

string(14) "http://blah"
string(11) "http://blah"

strToHex:

%68%74%74%70%3a%2f%2f%62%6c%61%68%00%00%00
%68%74%74%70%3a%2f%2f%62%6c%61%68

When I compare them, they return false. How can I manipulate the string type, so that I can perform a comparison that returns true?

What is the difference between string 11 and string 14? I am sure there is a simple resolution, but I have not found anything yet. No matter how I implode, explode, UTF-8 encode, etc., they will not compare the strings or change type.


Letter "a" can be written in another encoding.

For example: blаh. Here a is a Cyrillic 'а'.

All of these letters are Cyrillic, but it looks like Latin: у, е, х, а, р, о, с


Trim the strings before comparing. There are escaped characters, like \t and \n, which are not visible.

$clean_str = trim($str);


When using var_dump(), then string(14) means that the value is a string that holds 14 bytes. So string(11) and string(14) are not different "types" of strings; they are just strings of different length.

I would use something like this to see what actually is inside those strings:

function strToHex($value, $prefix = '') {
    $result = '';
    $length = strlen($value);
    for ( $n = 0; $n < $length; $n++ ) {
        $result .= $prefix . sprintf('%02x', ord($value[$n]));
    }
    return $result;
}

echo strToHex("test\r\n", '%');

Output:

%74%65%73%74%0d%0a

This decodes as:

  • %74 - t
  • %65 - e
  • %73 - s
  • %74 - t
  • %0d - \r (carriage return)
  • %0a - \n (line feed)

Or, as pointed out in comments by Karolis, you can use the built-in function bin2hex():

echo bin2hex("test\r\n");

Output:

746573740d0a


Try to trim these strings:

if (trim($string1) == trim($string2)) {
  // Do things
}


Probably Unicode strings within the upper range are counted as double bytes.

Use mb_strlen() to check lengths.

Also some characters may not be visible, but present (there are many of Unicode spaces, etc.)

Generally, when you work with Unicode functions, you should use the mb_* string functions.

You may overload string encoding functions in php.ini to always use mb_* functions instead the standard ones (I am not sure if Xdebug honors those settings).

In PHP 6 this problem will be solved, as it should be globally Unicode-aware.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜