python: keep char only if it is within this list
i have a list:
a = ['a','b','c'.........'A','B','C'.........'Z']
and i have string:
string1= 's#$%E开发者_C百科RGdfhliisgdfjkskjdfW$JWLI3590823r'
i want to keep ONLY those characters in string1
that exist in a
what is the most effecient way to do this? perhaps instead of having a
be a list, i should just make it a string? like this a='abcdefg..........ABC..Z'
??
This should be faster.
>>> import re
>>> string1 = 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
>>> a = ['E', 'i', 'W']
>>> r = re.compile('[^%s]+' % ''.join(a))
>>> print r.sub('', string1)
EiiWW
This is even faster than that.
>>> all_else = ''.join( chr(i) for i in range(256) if chr(i) not in set(a) )
>>> string1.translate(None, all_else)
'EiiWW'
44 microsec vs 13 microsec on my laptop.
How about that?
(Edit: turned out, translate yields the best performance.)
''.join([s for s in string1 if s in a])
Explanation:
[s for s in string1 if s in a]
creates a list of all characters in string1, but only if they are also in the list a.
''.join([...])
turns it back into a string by joining it with nothing ('') in between the elements of the given list.
List comprehension to the rescue!
wanted = ''.join(letter for letter in string1 if letter in a)
(Note that when passing a list comprehension to a function you can omit the brackets so that the full list isn't generated prior to being evaluated. While semantically the same as a list comprehension, this is called a generator expression.)
If, you are going to do this with large strings, there is a faster solution using translate
; see this answer.
@katrielalex: To spell it out:
import string
string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
non_letters= ''.join(chr(i) for i in range(256) if chr(i) not in string.letters)
print string1.translate(None,non_letters)
print 'Simpler, but possibly less correct'
print string1.translate(None, string.punctuation+string.digits+string.whitespace)
精彩评论