Regex code question

2023-03-25 09:30 问答作者：

I'm new to this site and don't know if this is the 开发者_如何学Goplace to ask this question here?

I was wondering if someone can explain the 3 regex code examples below in detail?

Thanks.

Example 1

`&([a-z]{1,2})(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig);`i

Example 2

\\1

Example 3

`[^a-z0-9]`i','`[-]+`

The first regex looks like it'll match the HTML entities for accented characters (e.g., é is é; ø is ø; æ is æ; and Â is Â).

To break it down, & will match an ampersand (the start of the entity), ([a-z]{1,2}) will match any lowercase letter one or two times, (acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig) will match one of the terms in the pipe-delimited list (e.g., circ, grave, cedil, etc.), and ; will match a semicolon (the end of the entity). I'm not sure what the i character means at the end of the line; it's not part of the regex.

All told, it will match the HTML entities for accented/diacritic/ligatures. Compared, though, to this page, it doesn't seem that it matches all of the valid entities (although ti does catch many of them). Unless you run in case-insensitive mode, the [a-z] will only match lowercase letters. It will also never match the entities ð or þ (ð, þ, respectively) or their capital versions (Ð, Þ, also respectively).

The second regex is simpler. \1 in a regex (or in regex find-replace) simply looks for the contents of the first capturing group (denoted by parentheses ()) and (in a regex) matches them or (in the replace of a find) inserts them. What you have there \\1 is the \1, but it's probably written in a string in some other programming language, so the coder had to escape the backslash with another backslash.

For your third example, I'm less certain what it does, but I can explain the regexes. [^a-z0-9] will match any character that's not a lowercase letter or number (or, if running in case-insensitive mode, anything that's not a letter or a number). The caret (^) at the beginning of the character class (that's anything inside square brackets []) means to negate the class (i.e., find anything that is not specified, instead of the usual find anything that is specified). [-]+ will match one or more hyphens (-). I don't know what the i',' between the regexes means, but, then, you didn't say what language this is written in, and I'm not familiar with it.

继续阅读：regex

Regex code question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？