Counting cats with regular expressions
So I want to match a string with the word "cat" in it a bunch of times, such as:
"cat cat cat cat cat"
or
"cat cat cat cat"
If there's anything else besides "cat" or whitespace, I don't want to match. So I can do:
^(cat\s*)+$
However, I want to find out how many cats appear in the string. One way to do t开发者_开发技巧his would be to count the number of groups, however the above regular expression will only give me a single group with the first cat, not a capture per cat. Is there a way to do this using regular expressions?
I don't see anyone mentioning what I consider the obvious answer, using String#scan:
str = "cat cat cat catcat"
str.scan('cat').size #=> 5
If you just have to use a regex:
str.scan(/cat/).size #=> 5
If you want to only catch unique, not run-together, occurrences:
str.scan(/\bcat\b/).size #=> 3
EDIT:
@sawa points out that there is (considerable) room for misinterpretation of the OP's question. This covers cases where the OP didn't want a search to occur if something besides cat
and " "
was in the string.
str.scan('cat').size if str.gsub(/(?:cat| )+/, '').empty? #=> 5
The other variations in my previous section can still be applied.
And, since "whitespace" could mean more than a simple space, "\s"
should also work fine.
Note that Mike's original regexp as well as Tomalak, Marten, tagman's answer all give the wrong count when the string includes instances of 'cat' that are consecutive (unless you want to consider 'catcat' as two instances of the word 'cat'). The following does not meet this problem.
def count_if_match
delimiters = strip.split('cat')
delimiters.length+1 if delimiters.all?{|s| s =~ / +/}
end
' cat cat cat cat'.count_if_match # => 4
' catcat cat cat'.count_if_match # => nil
You want to do two different things - validate a string and count word occurrences. Usually you cannot do these two things in one step.
var str = "cat cat cat cat cat";
var count = 0;
if ( /^(cat\s*)+$/.test(str) ) {
count = str.match(/cat/g).length;
}
In .NET regex you have Group.Captures which lists all the occurrences where a group matched, not just the last one, like in other regex engines. Here you could do both validating and counting in one step.
Consider translating whitespaces to newlines, then count the lines matching the regexp.
It's actually the last cat you're capturing. That happens because of the greediness of + and the way capture groups work. I don't think it's possible to get more than one capture out of a group. The best thing you can do is probably:
str = "cat cat cat cat"
matchdata = str.match(/^((?:cat\s*)+)$/)
=> #<MatchData "cat cat cat cat" 1:"cat cat cat cat">
matchdata[0].split(/\s+/).size
=> 4
A Ruby way without regex would be:
string = "cat cat cat cat"
def match_cat(string)
cat_array = string.split
count = cat_array.size
cat_array.uniq == ["cat"] ? count : false
end
match_cat(string)
=> 4
"cat cat cat cat".split.count{|w|
break false unless w == 'cat'
true
}
精彩评论