开发者

Regex to match tags like <A>, <BB>, <CCC> but not <ABC>

I need a regex to ma开发者_开发知识库tch tags that looks like <A>, <BB>, <CCC>, but not <ABC>, <aaa>, <>. so the tag must consist of the same uppercase letter, repeated. I've tried <[A-Z]+>, but that doesn't work. of course I can write something like <(A+|B+|C+|...)> and so on, but I wonder if there's a more elegant solution.


You can use something like this (see this on rubular.com):

<([A-Z])\1*>

This uses capturing group and backreference. Basically:

  • You use (pattern) to "capture" a match
  • You can then use \n in your pattern, where n is the group number, to "refer back" to what that group matched

So in this case:

  • Group 1 captures ([A-Z]), an uppercase letter immediately following <
  • Then we see if we can match \1*, i.e. zero or more of that same letter

References

  • regular-expressions.info/Grouping and Backreference
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜