PHP string comparison for strings which look the same in some views, but not in others
I have two strings that look the same when I echo them, but when I var_dump()
them they are different string types:
Echo:
http://blah
http:/开发者_运维知识库/blah
var dump:
string(14) "http://blah"
string(11) "http://blah"
strToHex:
%68%74%74%70%3a%2f%2f%62%6c%61%68%00%00%00
%68%74%74%70%3a%2f%2f%62%6c%61%68
When I compare them, they return false. How can I manipulate the string type, so that I can perform a comparison that returns true?
What is the difference between string 11 and string 14? I am sure there is a simple resolution, but I have not found anything yet. No matter how I implode, explode, UTF-8 encode, etc., they will not compare the strings or change type.
Letter "a" can be written in another encoding.
For example: blаh
. Here a
is a Cyrillic 'а'.
All of these letters are Cyrillic, but it looks like Latin: у, е, х, а, р, о, с
Trim the strings before comparing. There are escaped characters, like \t and \n, which are not visible.
$clean_str = trim($str);
When using var_dump()
, then string(14)
means that the value is a string
that holds 14
bytes. So string(11)
and string(14)
are not different "types" of strings; they are just strings of different length.
I would use something like this to see what actually is inside those strings:
function strToHex($value, $prefix = '') {
$result = '';
$length = strlen($value);
for ( $n = 0; $n < $length; $n++ ) {
$result .= $prefix . sprintf('%02x', ord($value[$n]));
}
return $result;
}
echo strToHex("test\r\n", '%');
Output:
%74%65%73%74%0d%0a
This decodes as:
- %74 - t
- %65 - e
- %73 - s
- %74 - t
- %0d - \r (carriage return)
- %0a - \n (line feed)
Or, as pointed out in comments by Karolis, you can use the built-in function bin2hex()
:
echo bin2hex("test\r\n");
Output:
746573740d0a
Try to trim these strings:
if (trim($string1) == trim($string2)) {
// Do things
}
Probably Unicode strings within the upper range are counted as double bytes.
Use mb_strlen() to check lengths.
Also some characters may not be visible, but present (there are many of Unicode spaces, etc.)
Generally, when you work with Unicode functions, you should use the mb_*
string functions.
You may overload string encoding functions in php.ini to always use mb_*
functions instead the standard ones (I am not sure if Xdebug honors those settings).
In PHP 6 this problem will be solved, as it should be globally Unicode-aware.
精彩评论