开发者

Regular expressions get a single word out of a common phrase

i have a phrase like this

Computer, Eddie is gone to the market.

I want to get the word Eddie and ignore all of the other words since other words are constant, and the word Eddie could be anything.

How can I do th开发者_如何学编程is in regular expression?

Edit:

Sorry I'm using .NET regex :)


You can use this pattern:

Computer, (\w+) is gone to the market\.

This uses brackets to match \w+ and captures it in group 1.

Note that the period at the end has been escaped with a \ because . is a regex metacharacter.

Given the input:

LOL! Computer, Eddie is gone to the market. Blah blah
blah. Computer, Alice is gone to the market... perhaps...

Computer, James Bond is gone to the market.

Then there are two matches (as seen on rubular.com). In the first match, group 1 captured Eddie. In the second match, group 1 captured Alice.

Note that \w+ doesn't match James Bond, because \w+ is a sequence of "one or more word character". If you need to match these kinds non-"single word" names, then simply replace it with the regex to match the names.

References

  • regular-expressions.info/Capturing Groups and The Dot

General technique

Given this test string:

i have 35 dogs, 16 cats and 10 elephants

Then (\d+) (cats|dogs) yields 2 match results (see on rubular.com)

  • Result 1: 35 dogs
    • Group 1 captures 35
    • Group 2 captures dogs
  • Result 2: 16 cats
    • Group 1 captures 16
    • Group 2 captures cats

Related questions

  • Saving substrings using Regular Expressions

C# snippet

Here's a simple example of capturing groups usage:

var text = @"

LOL! Computer, Eddie is gone to the market. Blah blah
blah. Computer, Alice is gone to the market... perhaps...

Computer, James Bond is gone to the market.

";

Regex r = new Regex(@"Computer, (\w+) is gone to the market\.");

foreach (Match m in r.Matches(text)) {
  Console.WriteLine(m.Groups[1]);
}

The above prints (as seen on ideone.com):

Eddie
Alice

API references

  • System.Text.RegularExpressions Namespace

On specification

As noted, \w+ does not match "James Bond". It does, however, match "o_o", "giggles2000", etc (as seen on rubular.com). As much as reasonably practical, you should try to make your patterns as specific as possible.

Similarly, (\d+) (cats|dogs) will match 100 cats in $100 catsup (as seen on rubular.com).

These are issues on the patterns themselves, and not directly related to capturing groups.


/^Computer, \b(.+)\b is gone to the market\.$/

Eddie would be in the first captured string $1. If you specify the language, we can tell you how to extract it.

Edit: C#:

Match match = Regex.Match(input, @"^Computer, \b(.+)\b is gone to the market\.$");
Console.WriteLine(match.Groups[1].Value);

Get rid of ^ and $ from the regex if the string would be part of another string - they match start and end of a line respectively.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜