Parsing a regex
I am having trouble writing a regular expression in C#; its purpose is to extract all words that start with '@' from a given string so they can be stored in some type of data structure.
If the string is "The quick @brown fox jumps over the lazy @dog", I'd like to get an array that contains two elements: brown and dog. It needs to handle the edge cases properly. For example, if it's @@brown, it should still produce 'brown' not '@brown开发者_StackOverflow中文版'.
something like this
C#:
string quick = "The quick @brown fox jumps over the lazy @dog @@dog";
MatchCollection results = Regex.Matches(quick, "@\\w+");
foreach (Match m in results)
{
Literal1.Text += m.Value.Replace("@", "");
}
takes care of your edge case too. (@@dog => dog)
@[\w\d]+
should work for you.
Tested using http://www.regextester.com/.
This works by matching for the @
, followed by one or more word characters. The \w
represents any "word character" (character sets), the \d
represents any digit, and the +
(repetition) indicates one or more. The \w
and \d
are both allowed by being wrapped in brackets.
To exclude the @
you could use str.Substring(1)
to ignore the first character, or use the regex @([\w\d]+)
and extract the first group.
Depending on your definition of "word" (\w
is more the C-language definition of a symbol valid in an identifier or keyword: [a-z0-9_]
.), you might try the folowing — I'm defining "word" here as a sequence of non-whitespace characters:
(^|\s)(@+(?<atword>[^\s]+))(\s|$)
The above has been tested here, and matches the following:
- Match start-of-string or a whitespace character, followed by
- 1 or more
@
characters, followed by - 1 or more non-whitespace characters, in group named 'atword', followed by
- a whitespace character or end-of-string.
For successful matches, the named group atword
will contain the text following the lead-in @
sign(s).
So:
This @@ foo
won't match.This @foo bar
will match- `@@@foobarbat is kind of silly will match
- `@@@foobar@bazabat will match.
silly.@rabbit, tricks are for kids
won't match, butsilly @rabbit, tricks are for kids
will match and you'll getrabbit,
rather thanrabbit
(like I said, you need to think about how you define 'word'.- etc.
精彩评论