Python: convert camel case to space delimited using RegEx and taking Acronyms into account
I am trying to convert camel case to space separated values using python. For example:
divLineColor -> div Line Color
This line does that successfully:
label = re.sub("([A-Z])"," \g<0>",label)
The problem I am having is with things like simpleBigURL
they should do this:
simpleBigURL -> simple Big URL
I am not entirely sure how to get this result. Help!
This is one thing that I tried:
label = re.sub("([a-z])([A-Z])","\g<0> \g<1>",label)
But this produc开发者_高级运维es weird results like:
divLineColor -> divL vineC eolor
I was also thinking that using the (?!...)
could work but I have not had any luck.
This should work with 'divLineColor', 'simpleBigURL', 'OldHTMLFile' and 'SQLServer'.
label = re.sub(r'((?<=[a-z])[A-Z]|(?<!\A)[A-Z](?=[a-z]))', r' \1', label)
Explanation:
label = re.sub(r"""
( # start the group
# alternative 1
(?<=[a-z]) # current position is preceded by a lower char
# (positive lookbehind: does not consume any char)
[A-Z] # an upper char
#
| # or
# alternative 2
(?<!\A) # current position is not at the beginning of the string
# (negative lookbehind: does not consume any char)
[A-Z] # an upper char
(?=[a-z]) # matches if next char is a lower char
# lookahead assertion: does not consume any char
) # end the group""",
r' \1', label, flags=re.VERBOSE)
If a match is found it is replaced with ' \1'
, which is a string consisting of a leading blank and the match itself.
Alternative 1 for a match is an upper character, but only if it is preceded by a lower character. We want to translate abYZ
to ab YZ
and not to ab Y Z
.
Alternative 2 for a match is an upper character, but only if it is followed by a lower character and not at the start of the string. We want to translate ABCyz
to AB Cyz
and not to A B Cyz
.
\g<0>
references the matched string of the whole pattern while \g<1>
refereces the matched string of the first subpattern ((…)
). So you should use \g<1>
and \g<2>
instead:
label = re.sub("([a-z])([A-Z])","\g<1> \g<2>",label)
I know, it's not regex. But, you can also use map
like this
>>> s = 'camelCaseTest'
>>> ''.join(map(lambda x: x if x.islower() else " "+x, s))
'camel Case Test'
Other method:
def solution(s):
return ''.join(' ' + c if c.isupper() else c for c in s)
print(solution("mehdHidi"))
I don't think you can do it using a regular expression because you need to remember the previous element in order to do it for all cases. I think the following function works for all cases. For example, it converts 'AbcdEfgHIJKlmno' to 'Abcd Efg HIJ Klmno'
def camel_case_to_phrase(s):
prev = None
t = []
n = len(s)
i = 0
while i < n:
next_char = s[i+1] if i < n -1 else ''
c = s[i]
if prev is None:
t.append(c)
elif c.isupper() and prev.isupper():
if next_char.islower():
t.append(' ')
t.append(c)
else:
t.append(c)
elif c.isupper() and not prev.isupper():
t.append(' ')
t.append(c)
else:
t.append(c)
prev = c
i = i +1
return "".join(t)
(?<=[a-z])([A-Z])
or
([a-z])([A-Z])
I couldnt get a really nice regex, but this worked decently.
([a-z]+)([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?([A-Z][a-z]+)?
Breaking it down it is:
([a-z]+)
Any series of lowercase characters
([A-Z][a-z]+)?
Any uppercase character followed by 1 or more lowercase characters. This is optional
Then I repeated the second group 4 times. This will only work if you dont have any more than 4 "sections" or uppercase characters. Add or take away that regex grouping as necessary. It will work if there less than this number (i.e. it will work on divLineColor
) This will not match on words that are all uppercase.
>>> def unCamel(x): return reduce(lambda a,b: a + ((b.upper() == b and (len(a) and a[-1].upper() != a[-1])) and (' ' + b) or b), x, '')
...
>>>
>>> unCamel("simpleBigURL")
'simple Big URL'
>>> unCamel("URL")
'URL'
>>> unCamel("url")
'url'
>>> unCamel("uRl")
'u Rl'
Here is another solution to an old problem. It checks if the next character is upper case and not the characters either side, if so, add a space. The is_upper
function handles the None Type
produce by the final two characters in the zip_longest
function.
from itertools import zip_longest
def is_upper(char):
try:
return char.isupper()
except AttributeError:
return True
def uncamel(word):
return ''.join(
c0 + " " if is_upper(c1) and not (is_upper(c0) and is_upper(c2)) else c0
for c0, c1, c2 in zip_longest(word, word[1::], word[2::])
).strip()
uncamel("simpleBigURLEndsWithWORDIt")
# returns: 'simple Big URL Ends With WORD It'
Here's my simple solution, which works with PCRE-like implementations, including Python:
/(?<=[a-zA-Z])(?=[A-Z])/g
Then, simply replace all matches with a single space (). Putting it all together:
re.sub(r'(?<=[a-zA-Z])(?=[A-Z])', ' ', yourCamelCaseString);
SSCCE
Hope this method helps :
public static String convertCamelCaseToStatement(String camelCase) {
StringBuilder builder = new StringBuilder();
for (Character c : camelCase.toCharArray()) {
if (Character.isUpperCase(c)) {
builder.append(" ").append(c);
} else {
builder.append(c);
}
}
return builder.toString();
}
精彩评论