Regex to match tags like <A>, <BB>, <CCC> but not <ABC>
I need a regex to ma开发者_开发知识库tch tags that looks like <A>
, <BB>
, <CCC>
, but not <ABC>
, <aaa>
, <>
. so the tag must consist of the same uppercase letter, repeated. I've tried <[A-Z]+>
, but that doesn't work. of course I can write something like <(A+|B+|C+|...)>
and so on, but I wonder if there's a more elegant solution.
You can use something like this (see this on rubular.com):
<([A-Z])\1*>
This uses capturing group and backreference. Basically:
- You use
(pattern)
to "capture" a match - You can then use
\n
in your pattern, wheren
is the group number, to "refer back" to what that group matched
So in this case:
- Group 1 captures
([A-Z])
, an uppercase letter immediately following<
- Then we see if we can match
\1*
, i.e. zero or more of that same letter
References
- regular-expressions.info/Grouping and Backreference
精彩评论