开发者

multibyte identifiers list

I was looking into multi-byte characters and how they are used but how many different identifiers/pasterns are used for dif开发者_高级运维ferent multi-bytes.

e.g: &nbps;,&#nbsp;,U+0026,%20

how many different identifiers such as &,&#,u+ ,% etc are there ?

Im trying to look for inputs if they have words which are more than 255 characters long then its probably a multi-byte (hack attempt) and then I can check if word can be split has the multi-byte identifier then stop the hack attempt.


% format - a url-encoded value for embedding into URLS, e.g. %20 is a space (ascii 20)
  - named character entity, a non-breaking space in this case
U+0026 - a unicode character in hex notation, an & in this case
&#...; - a numbered character entity in decimal (base10) & = &
&#x...; - a numbered character entity in hex (base 16): & = &


Are you trying to avoid homoglyph-based spoofing ? Does identifier means username here ?

If yes, and if your users use a latin alphabet, just allow only ascii letters and numbers:

$identifier = preg_replace('#[^A-Za-z0-9]+#', '', $identifier);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜