How do I convert a string to a valid variable name in Python?
I need to convert an arbitrary string to a string that is a valid variable name in Pytho开发者_运维技巧n.
Here's a very basic example:
s1 = 'name/with/slashes'
s2 = 'name '
def clean(s):
s = s.replace('/', '')
s = s.strip()
return s
# the _ is there so I can see the end of the string
print clean(s1) + '_'
That is a very naive approach. I need to check if the string contains invalid variable name characters and replace them with ''
What would be a pythonic way to do this?
Well, I'd like to best Triptych's solution with ... a one-liner!
>>> def clean(varStr): return re.sub('\W|^(?=\d)','_', varStr)
...
>>> clean('32v2 g #Gmw845h$W b53wi ')
'_32v2_g__Gmw845h_W_b53wi_'
This substitution replaces any non-variable appropriate character with underscore and inserts underscore in front if the string starts with a digit. IMO, 'name/with/slashes' looks better as variable name name_with_slashes
than as namewithslashes
.
According to Python, an identifier is a letter or underscore, followed by an unlimited string of letters, numbers, and underscores:
import re
def clean(s):
# Remove invalid characters
s = re.sub('[^0-9a-zA-Z_]', '', s)
# Remove leading characters until we find a letter or underscore
s = re.sub('^[^a-zA-Z_]+', '', s)
return s
Use like this:
>>> clean(' 32v2 g #Gmw845h$W b53wi ')
'v2gGmw845hWb53wi'
You can use the built in func:str.isidentifier()
in combination with filter()
.
This requires no imports such as re
and works by iterating over each character and returning it if its an identifier. Then you just do a ''.join
to convert the array to a string again.
s1 = 'name/with/slashes'
s2 = 'name '
def clean(s):
s = ''.join(filter(str.isidentifier, s))
return s
print f'{clean(s1)}_' #the _ is there so I can see the end of the string
EDIT:
If, like Hans Bouwmeester in the replies, want numeric values to be included as well, you can create a lambda which uses both the isIdentifier and the isdecimal functions to check the characters. Obviously this can be expanded as far as you want to take it. Code:
s1 = 'name/with/slashes'
s2 = 'name i2, i3 '
s3 = 'epng2 0-2g [ q4o 2-=2 t1 l32!@#$%*(vqv[r 0-34 2]] '
def clean(s):
s = ''.join(filter(
lambda c: str.isidentifier(c) or str.isdecimal(c), s))
return s
#the _ is there so I can see the end of the string
print(f'{ clean(s1) }_')
print(f'{ clean(s2) }_')
print(f'{ clean(s3) }_')
Gives :
namewithslashes_
namei2i3_
epng202gq4o22t1l32vqvr0342_
You should build a regex that's a whitelist of permissible characters and replace everything that is not in that character class.
精彩评论