开发者

Regex for matching capitals

 def normalized?

    matches = match(/[^A-Z]*/)
    return matches.size == 0

  end

This is my function operati开发者_如何转开发ng on a string, checking wether a string contains only uppercase letters. It works fine ruling out non matches, but when i call it on a string like "ABC" it says no match, because apparently matches.size is 1 and not zero. There seems to be an empty element in it or so.

Can anybody explain why?


Your regex is wrong - if you want it to match ONLY uppercase strings, use /^[A-Z]+$/.


Your regular expression is incorrect. /[^A-Z]*/ means "match zero or more characters that are not between A and Z, anywhere in the string". The string ABC has zero characters that are not between A and Z, so it matches the regular expression.

Change your regular expression to /^[^A-Z]+$/. This means "match one or more characters that are not between A and Z, and make sure every character between the beginning and end of the string are not between A and Z". Then the string ABC will not match, and then you can check matches[0].size or whatever, as per sepp2k's answer.


MatchData#size returns the number of capturing groups in the regex plus one, so that md[i] will access a valid group iff i < md.size. So the value returned by size only depends on the regex, not the matched string, and will never be 0.

You want matches.to_s.size or matches[0].size.


ruby-1.9.2-p180>   def normalized? s
ruby-1.9.2-p180?>    s.match(/^[[:upper:]]+$/) ? true : false
ruby-1.9.2-p180?>  end
 => nil 
ruby-1.9.2-p180>  normalized? "asdf"
 => false 
ruby-1.9.2-p180>  normalized? "ASDF"
 => true 


The * in your regular expression means that it matches any number of non-uppercase characters, including zero. So it always matches everything. The fix is to remove the *, then it will fail to match a string containing only uppercase characters. (Although you would need a different test if zero-length strings are not permitted.)


If you want to know that the input string entirely consists of English uppercase letters, i.e. A-Z, then you must remove the Kleene Star as it will match before and after every single character in any input string (zero length match). The statement !s[/[^A-Z]/] tells you if there's no match of non-A-to-Z characters:

irb(main):001:0> def normalized? s
irb(main):002:1>     return !s[/[^A-Z]/]
irb(main):003:1> end
=> nil
irb(main):004:0> normalized? "ABC"
=> true
irb(main):005:0> normalized? "AbC"
=> false
irb(main):006:0> normalized? ""
=> true
irb(main):007:0> normalized? "abc"
=> false


There is only 1 regular expression that defines a string with only and All capitals:

def onlyupper(s)
(s =~ /^[A-Z]+$/) != nil
end

Truth table:

/[^A-Z]*/:
 Testing  'asdf'     matched  'asdf'     length  4
 Testing  'HHH'      matched  ''         length  0
 Testing  ''         matched  ''         length  0
 Testing  '-=AAA'    matched  '-='       length  2
--------
/[^A-Z]+/:
 Testing  'asdf'     matched  'asdf'     length  4
 Testing  'HHH'      matched  nil
 Testing  ''         matched  nil
 Testing  '-=AAA'    matched  '-='       length  2
--------
/^[^A-Z]*$/:
 Testing  'asdf'     matched  'asdf'     length  4
 Testing  'HHH'      matched  nil
 Testing  ''         matched  ''         length  0
 Testing  '-=AAA'    matched  nil
--------
/^[^A-Z]+$/:
 Testing  'asdf'     matched  'asdf'     length  4
 Testing  'HHH'      matched  nil
 Testing  ''         matched  nil
 Testing  '-=AAA'    matched  nil
--------
/^[A-Z]*$/:
 Testing  'asdf'     matched  nil
 Testing  'HHH'      matched  'HHH'      length  3
 Testing  ''         matched  ''         length  0
 Testing  '-=AAA'    matched  nil
--------
/^[A-Z]+$/:
 Testing  'asdf'     matched  nil
 Testing  'HHH'      matched  'HHH'      length  3
 Testing  ''         matched  nil
 Testing  '-=AAA'    matched  nil
--------


This question needs a more clear answer. As tchrist commented, I wish he would have answered. The "Regex for matching capitals" is to use:

/\p{Uppercase}/

As tchrist mentions "is distinct from the general category \p{Uppercase_Letter} aka \p{Lu}. That’s because there exist non-Letters that count as Uppercase"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜