I need to upgrade one of my regular expressions
Currently i use the following regular expression to validate a textArea in JSF:
"^([a-zA-Z0-9]+[a-zA-Z0-9 ]+$)?"
It allows me to have multiple words and also uppercase and lower case characters, but still not enough, i need to make it better. It should also allow just a few special characters. Do you have any idea, how could i tune it to be able to:
-Allow the following 4 characters ,
.
;
:
-Allow also special letters from a non english alphabet, This are the letters that are needed: Đ
đ
Ž
ž
Ć
ć
Č
č
Š
š
I configured my web-app to use UTF-8, if the regular expresion could just allow those special letters, that would be great, because there would be less coding to validate each field every 开发者_开发问答time.
Just add them to the character-set marked with []
"^([a-zA-Z0-9,.;:ĐđŽžĆćČ芚]+[a-zA-Z0-9 ,.;:ĐđŽžĆćČ芚]+$)?"
Apart from your question, a suggestion for performance improvement: The first part is probably so the reg-exp may start with one of the allowed characters but space. As that is a special case for only the first character, remove the + sign. That way, it will match only the first character. Succeeding chars will be matched by the second part anyway.
"^([a-zA-Z0-9,.;:ĐđŽžĆćČ芚][a-zA-Z0-9 ,.;:ĐđŽžĆćČ芚]+$)?"
If the special characters are all from the same unicode bock you can match them with the expression \p{InGreek}
, replacing Greek
with the block the characters come from. You can also use a negative lookbehind to prevent matching a leading space. This would make the regex:
^(?! )[\p{Alnum}\p{InLatinExtendedA},.;: ]+$
If you'd rather not fail fast on a leading space, as your comments suggest, you can use this regex to trim leading and trailing whitespace as well:
^\s*([\p{Alnum}\p{InLatinExtendedA},.;: ]+?)\s*$
The first capturing group will be the valid string without leading or trailing whitespace.
精彩评论