Python RE question - proper state initial formatting
I have a string that I need to edit, it looks something similar to this:
string = "Idaho Ave N,,Crystal,Mn,55427-1463,US,,610839124763,Expedited"
If you notice the state initial "Mn" is not in proper formatting. I'm trying to use a regular expression to change this:
re.sub("[A-Z][a-z],", "[A-Z][A-Z],", string)
However, re.sub treats the second part as a literal and will change Mn, to [A-Z][A-Z],. How would I use re.sub (or something similar and simple) to properly change Mn, to MN, in this string?
Thank 开发者_运维技巧you in advance!
Your re.sub
might modify also parts of the string you would not want to modify. Try to process the right element in your list explicitly:
input = "Idaho Ave N,,Crystal,Mn,55427-1463,US,,610839124763,Expedited"
elems = input.split(',')
elems[3] = elems[3].upper()
output = ','.join(elems)
returns
'Idaho Ave N,,Crystal,MN,55427-1463,US,,610839124763,Expedited'
You can pass a function as the replacement parameter to re.sub
to generate the replacement string from the match object, e.g.:
import re
s = "Idaho Ave N,,Crystal,Mn,55427-1463,US,,610839124763,Expedited"
def upcase(match):
return match.group().upper()
print re.sub("[A-Z][a-z],", upcase, s)
(This is ignoring the concern of whether you're genuinely finding state initials with this method.)
The appropriate documentation for re.sub
is here.
sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.
re.sub("[A-Z][a-z]", lambda m: m.group(0).upper(), myString)
I would avoid calling your variable string
since that is a type name.
You create a group by surrounding it in parentheses withing your regex, then refer to is by its group number:
re.sub("([A-Z][a-z]),", "\1,".upper(), string)
精彩评论