开发者

Parsing a regex

I am having trouble writing a regular expression in C#; its purpose is to extract all words that start with '@' from a given string so they can be stored in some type of data structure.

If the string is "The quick @brown fox jumps over the lazy @dog", I'd like to get an array that contains two elements: brown and dog. It needs to handle the edge cases properly. For example, if it's @@brown, it should still produce 'brown' not '@brown开发者_StackOverflow中文版'.


something like this

C#:

string quick = "The quick @brown fox jumps over the lazy @dog @@dog";
MatchCollection results = Regex.Matches(quick, "@\\w+");

foreach (Match m in results)
{
    Literal1.Text += m.Value.Replace("@", "");
}

takes care of your edge case too. (@@dog => dog)


@[\w\d]+ should work for you.

Tested using http://www.regextester.com/.

This works by matching for the @, followed by one or more word characters. The \w represents any "word character" (character sets), the \d represents any digit, and the + (repetition) indicates one or more. The \w and \d are both allowed by being wrapped in brackets.

To exclude the @ you could use str.Substring(1) to ignore the first character, or use the regex @([\w\d]+) and extract the first group.


Depending on your definition of "word" (\w is more the C-language definition of a symbol valid in an identifier or keyword: [a-z0-9_].), you might try the folowing — I'm defining "word" here as a sequence of non-whitespace characters:

(^|\s)(@+(?<atword>[^\s]+))(\s|$)

The above has been tested here, and matches the following:

  • Match start-of-string or a whitespace character, followed by
  • 1 or more @ characters, followed by
  • 1 or more non-whitespace characters, in group named 'atword', followed by
  • a whitespace character or end-of-string.

For successful matches, the named group atword will contain the text following the lead-in @ sign(s).

So:

  • This @@ foo won't match.
  • This @foo bar will match
  • `@@@foobarbat is kind of silly will match
  • `@@@foobar@bazabat will match.
  • silly.@rabbit, tricks are for kids won't match, but
  • silly @rabbit, tricks are for kids will match and you'll get rabbit, rather than rabbit (like I said, you need to think about how you define 'word'.
  • etc.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜