Should regular expressions handle case or should we specify the case to match in code?
Following on from this question I asked yesterday:
Can I shorten this regular expression?
The solution was to use the following expression:
^([a-z]{5}-){4}[a-z]{5}$
To check for a match for a string with the following format:
aBcDe-fghIj-KLmno-pQRsT-uVWxy
I was advised to omit开发者_如何学编程 the A-Z from my original query and make the Regular Expression case insensitive in the code that uses it. For example, specify RegExOptions.IgnoreCase in the constructor for the RegEx in C#.
Is there any reason why this should be done in code rather than the regular expression itself?
I think this question is valid enough to warrant a new question rather than continuing the discussion in yesterday's.
There is no absolutely correct answer to this question. There are several ways to achieve certain things, and which is best is sometimes subjective. Besides, the two ways aren't exactly identical to begin with.
It should be noted that a regex pattern can in fact be partially case-insensitive. That is, you can have a pattern that is case insensitive in one part, but case sensitive in other parts.
Perhaps a good guideline is the following:
- The case-insensitivity flag can be used to indicate that (barring embedded modifiers that override the setting) the entire pattern matching process is case-insensitive
- If case-insensitivity doesn't apply to the entire pattern matching process, you may choose to dismiss the flag and just make it explicit that certain parts are
Do note that there is in fact a big difference between these two patterns:
/([a-z]+)-\1/i
/([A-Za-z]+)-\1/
Both patterns match "FOO-FOO"
and "bar-bar"
, but the first pattern matches "BOO-boo"
(as seen on rubular.com). The second pattern does not (as seen on rubular.com).
See also
- regular-expressions.info/Modifiers
- Specifying Modes Inside The Regular Expression
- Instead of
/regex/i
(Pattern.CASE_INSENSITIVE
in Java), you can do/(?i)regex/
- Instead of
- Turning Modes On and Off for Only Part of The Regular Expression
- You can also do
/first(?i)second(?-i)third/
- You can also do
- Modifier Spans
- You can also do
/first(?i:second)third/
- You can also do
- Specifying Modes Inside The Regular Expression
Related questions
- Case sensitive and insensitive in the same pattern
- Can you make just part of a regex case-insensitive?
Is there any reason why this should be done in code rather than the regular expression itself?
I can think of cases where you need to match including case, so you want the control in the regex itself. I can also think of cases where being able to just write in lower case and then setting the engine to case-insensitive will make the expression easier to write and maintain. The enclosing platform and language/tools are likely to influence preferences.
Summary: for each case of using a regex there will be reasons to prefer one way over the other, but in general there is no overriding approach.
No real reason besides readability, I guess. In your case, it doesn't really matter if your provide the two extra A-Z
instead of using RegExOptions.IgnoreCase
, IMO. But if you use quite a lot of a-zA-Z
in a regex, then it might help to use a-z
and RegExOptions.IgnoreCase
to shorten the regex a bit.
All a matter of personal preference if you ask me.
精彩评论