why do these regex tests let certain characters pass?

2022-12-15 00:04 问答作者：

I am checking a string with the following regexes:

[a-开发者_如何学编程zA-Z0-9]+
[A-Za-z]+

For some reason, the characters:

.
-
_

are allowed to pass, why is that?

If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:

^[a-zA-Z0-9]+$

Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.

To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.

However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.

Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.

So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.

In regular expressions the + tells the engine to match one or more characters.

So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.

The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.

So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.

While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

继续阅读：regex

why do these regex tests let certain characters pass?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？