开发者

Regex code question

I'm new to this site and don't know if this is the 开发者_如何学Goplace to ask this question here?

I was wondering if someone can explain the 3 regex code examples below in detail?

Thanks.

Example 1

`&([a-z]{1,2})(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig);`i

Example 2

\\1

Example 3

`[^a-z0-9]`i','`[-]+`


The first regex looks like it'll match the HTML entities for accented characters (e.g., é is é; ø is ø; æ is æ; and  is Â).

To break it down, & will match an ampersand (the start of the entity), ([a-z]{1,2}) will match any lowercase letter one or two times, (acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig) will match one of the terms in the pipe-delimited list (e.g., circ, grave, cedil, etc.), and ; will match a semicolon (the end of the entity). I'm not sure what the i character means at the end of the line; it's not part of the regex.

All told, it will match the HTML entities for accented/diacritic/ligatures. Compared, though, to this page, it doesn't seem that it matches all of the valid entities (although ti does catch many of them). Unless you run in case-insensitive mode, the [a-z] will only match lowercase letters. It will also never match the entities ð or þ (ð, þ, respectively) or their capital versions (Ð, Þ, also respectively).


The second regex is simpler. \1 in a regex (or in regex find-replace) simply looks for the contents of the first capturing group (denoted by parentheses ()) and (in a regex) matches them or (in the replace of a find) inserts them. What you have there \\1 is the \1, but it's probably written in a string in some other programming language, so the coder had to escape the backslash with another backslash.


For your third example, I'm less certain what it does, but I can explain the regexes. [^a-z0-9] will match any character that's not a lowercase letter or number (or, if running in case-insensitive mode, anything that's not a letter or a number). The caret (^) at the beginning of the character class (that's anything inside square brackets []) means to negate the class (i.e., find anything that is not specified, instead of the usual find anything that is specified). [-]+ will match one or more hyphens (-). I don't know what the i',' between the regexes means, but, then, you didn't say what language this is written in, and I'm not familiar with it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜