Regex code question
I'm new to this site and don't know if this is the 开发者_如何学Goplace to ask this question here?
I was wondering if someone can explain the 3 regex code examples below in detail?
Thanks.
Example 1
`&([a-z]{1,2})(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig);`i
Example 2
\\1
Example 3
`[^a-z0-9]`i','`[-]+`
The first regex looks like it'll match the HTML entities for accented characters (e.g., é
is é; ø
is ø; æ
is æ; and Â
is Â).
To break it down, &
will match an ampersand (the start of the entity), ([a-z]{1,2})
will match any lowercase letter one or two times, (acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig)
will match one of the terms in the pipe-delimited list (e.g., circ, grave, cedil, etc.), and ;
will match a semicolon (the end of the entity). I'm not sure what the i
character means at the end of the line; it's not part of the regex.
All told, it will match the HTML entities for accented/diacritic/ligatures. Compared, though, to this page, it doesn't seem that it matches all of the valid entities (although ti does catch many of them). Unless you run in case-insensitive mode, the [a-z]
will only match lowercase letters. It will also never match the entities ð
or þ
(ð, þ, respectively) or their capital versions (Ð, Þ, also respectively).
The second regex is simpler. \1
in a regex (or in regex find-replace) simply looks for the contents of the first capturing group (denoted by parentheses ()
) and (in a regex) matches them or (in the replace of a find) inserts them. What you have there \\1
is the \1
, but it's probably written in a string in some other programming language, so the coder had to escape the backslash with another backslash.
For your third example, I'm less certain what it does, but I can explain the regexes. [^a-z0-9]
will match any character that's not a lowercase letter or number (or, if running in case-insensitive mode, anything that's not a letter or a number). The caret (^
) at the beginning of the character class (that's anything inside square brackets []
) means to negate the class (i.e., find anything that is not specified, instead of the usual find anything that is specified). [-]+
will match one or more hyphens (-
). I don't know what the i','
between the regexes means, but, then, you didn't say what language this is written in, and I'm not familiar with it.
精彩评论