开发者

Counting cats with regular expressions

So I want to match a string with the word "cat" in it a bunch of times, such as:

"cat cat cat cat cat"

or

"cat   cat cat  cat"

If there's anything else besides "cat" or whitespace, I don't want to match. So I can do:

^(cat\s*)+$

However, I want to find out how many cats appear in the string. One way to do t开发者_开发技巧his would be to count the number of groups, however the above regular expression will only give me a single group with the first cat, not a capture per cat. Is there a way to do this using regular expressions?


I don't see anyone mentioning what I consider the obvious answer, using String#scan:

str = "cat cat cat    catcat"
str.scan('cat').size #=> 5

If you just have to use a regex:

str.scan(/cat/).size #=> 5

If you want to only catch unique, not run-together, occurrences:

str.scan(/\bcat\b/).size #=> 3

EDIT:

@sawa points out that there is (considerable) room for misinterpretation of the OP's question. This covers cases where the OP didn't want a search to occur if something besides cat and " " was in the string.

str.scan('cat').size if str.gsub(/(?:cat| )+/, '').empty? #=> 5

The other variations in my previous section can still be applied.

And, since "whitespace" could mean more than a simple space, "\s" should also work fine.


Note that Mike's original regexp as well as Tomalak, Marten, tagman's answer all give the wrong count when the string includes instances of 'cat' that are consecutive (unless you want to consider 'catcat' as two instances of the word 'cat'). The following does not meet this problem.

def count_if_match
  delimiters = strip.split('cat')
  delimiters.length+1 if delimiters.all?{|s| s =~ / +/}
end

' cat   cat cat  cat'.count_if_match # => 4
' catcat cat cat'.count_if_match # => nil


You want to do two different things - validate a string and count word occurrences. Usually you cannot do these two things in one step.

var str   = "cat cat cat cat cat";
var count = 0;

if ( /^(cat\s*)+$/.test(str) ) {
  count = str.match(/cat/g).length;
}

In .NET regex you have Group.Captures which lists all the occurrences where a group matched, not just the last one, like in other regex engines. Here you could do both validating and counting in one step.


Consider translating whitespaces to newlines, then count the lines matching the regexp.


It's actually the last cat you're capturing. That happens because of the greediness of + and the way capture groups work. I don't think it's possible to get more than one capture out of a group. The best thing you can do is probably:

str = "cat   cat cat  cat"

matchdata = str.match(/^((?:cat\s*)+)$/)
=> #<MatchData "cat   cat cat  cat" 1:"cat   cat cat  cat"> 

matchdata[0].split(/\s+/).size
=> 4


A Ruby way without regex would be:

string = "cat   cat cat  cat"
def match_cat(string)
  cat_array = string.split
  count = cat_array.size
  cat_array.uniq == ["cat"] ? count : false
end
match_cat(string)
=> 4


"cat   cat cat  cat".split.count{|w|
    break false unless w == 'cat'

    true
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜