How to split a CamelCase string in its substrings in Ruby?
I have a nice CamelCase string such as ImageWideNice
or ImageNarrowUgly
. Now I want to break that string in its substrings, such as Image
, Wide
or Narrow
, and Nice
or Ugly
.
I thought this could be solved simply by
camelCaseString =~ /(Image)((Wide)|(Narrow))((Nice)|(Ugly))/
But strangely, this will only fill $1
and $2
, but not $3
.
Do you开发者_StackOverflow have a better idea for splitting that string?
s = 'nowIsTheTime'
s.split /(?=[A-Z])/
=> ["now", "Is", "The", "Time"]
?=pattern
is an example of positive lookahead. It essentially matches a point in the string right before pattern. It doesn't consume the characters, that is, it doesn't include pattern as part of the match. Another example:
irb> 'streets'.sub /t(?=s)/, '-'
=> "stree-s"
In this case the s
is matched (only the second t
matches) but not replaced. Thanks to @Bryce and his regexp doc link. Bryce Anderson adds an explanation:
The
?=
at the beginning of the()
match group is called positive lookahead, which is just a way of saying that while the regex is looking at the characters in determining whether it matches, it's not making them part of the match.split()
normally eats the in-between characters, but in this case the match itself is empty, so there's nothing [there].
I know this is old, but worth mentioning for others who might be looking for this. In rails you could do this: "NowIsTheTime".underscore.humanize
DigitalRoss's answer is correct as it handles the general case where you do not know whether it's strict camel case (first character lower case) or Pascal case (first letter upper case).
If you know which of these forms the string is in, or you want to force one or the other, Inflector can do it.
For Pascal case:
"NowIsTheTime".titleize
For camel case:
"nowIsTheTime".titleize.camelize :lower
Event though this is a Ruby regex question and the answer by DigitalRoss is correct and shines by its simplicity, I want to add a Java answer:
// this regex doesn't work perfect with Java and other regex engines
"NowIsTheTime".split("(?=[A-Z])"); // ["", "Now", "Is", "The", "Time"]
// this regex works with first uppercase or lowercase characters
"NowIsTheTime".split("(?!(^|[a-z]|$))"); // ["Now", "Is", "The", "Time"]
"nowIsTheTime".split("(?!(^|[a-z]|$))"); // ["now", "Is", "The", "Time"]
Have you tried
camelCaseString =~ /(Image)(Wide|Narrow)(Nice|Ugly)/
?
I/p:- "ImageWideNice".scan(/[A-Z][a-z]+/).join(",")
O/p:- "Image,Wide,Nice"
The answer from DigitalRoss will not recognize acronyms embedded in the CamelCase. For example, it will split "MyHTMLTricks" into "My H T M L Tricks" instead of "My HTML Tricks".
Here is another option based on the AsSpaced()
function in PmWiki, which does a great job of being sensitive to cases like this:
"MyHTMLTricks" \
.gsub(/([[:lower:]\\d])([[:upper:]])/, '\1 \2') \
.gsub(/([^-\\d])(\\d[-\\d]*( |$))/,'\1 \2') \
.gsub(/([[:upper:]])([[:upper:]][[:lower:]\\d])/, '\1 \2')
=> "My HTML Tricks"
The other thing I like about this approach is that it leaves the string a string, instead of transforming it into an array. If you really want the array, then just add a split at the end.
"MyHTMLTricks" \
.gsub(/([[:lower:]\\d])([[:upper:]])/, '\1 \2') \
.gsub(/([^-\\d])(\\d[-\\d]*( |$))/,'\1 \2') \
.gsub(/([[:upper:]])([[:upper:]][[:lower:]\\d])/, '\1 \2') \
.split
=> ["My", "HTML", "Tricks"]
For the record, here is the original PHP code from PmWiki.
function AsSpaced($text) {
$text = preg_replace("/([[:lower:]\\d])([[:upper:]])/", '$1 $2', $text);
$text = preg_replace('/([^-\\d])(\\d[-\\d]*( |$))/', '$1 $2', $text);
return preg_replace("/([[:upper:]])([[:upper:]][[:lower:]\\d])/", '$1 $2', $text);
}
def solution(string)
final_str = []
string.chars.each do |x|
final_str << " " if x.upcase == x
final_str << x
end
final_str.join
end
精彩评论