small issue with whitespeace/punctuation in python?
I have this function that will convert text language into English:
def translate(string):
textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go',
'lol': 'laugh out loud', 'ur': 'your',}
translatestring = ''
for word in string.split(' '):
if word in textDict:
translatestring = translatestring + textDict[word]
else:
translatestring = translatestring + word
return translatestring
However, if I want to translate y u l8?
it will return wh开发者_C百科yyoul8?
. How would I go about separating the words when I return them, and how do I handle punctuation? Any help appreciated!
oneliner comprehension:
''.join(textDict.get(word, word) for word in re.findall('\w+|\W+', string))
[Edit] Fixed regex.
You're adding words to a string without spaces. If you're going to do things this way (instead of the way suggested to your in your previous question on this topic), you'll need to manually re-add the spaces since you split on them.
"y u l8" split on " ", gives ["y", "u", "l8"]. After substitution, you get ["why", "you", "late"] - and you're concatenating these without adding spaces, so you get "whyyoulate". Both forks of the if should be inserting a space.
You can just add a + ' ' +
to add a space. However, I think what you're trying to do is this:
import re
def translate_string(str):
textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go', 'lol': 'laugh out loud', 'ur': 'your',}
translatestring = ''
for word in re.split('([^\w])*', str):
if word in textDict:
translatestring += textDict[word]
else:
translatestring += word
return translatestring
print translate_string('y u l8?')
This will print:
why you late?
This code handles stuff like question marks a bit more gracefully and preserves spaces and other characters from your input string, while retaining your original intent.
I'd like to suggest the following replacement for this loop:
for word in string.split(' '):
if word in textDict:
translatestring = translatestring + textDict[word]
else:
translatestring = translatestring + word
for word in string.split(' '): translatetring += textDict.get(word, word)
The dict.get(foo, default)
will look up foo
in the dictionary and use default
if foo
isn't already defined.
(Time to run, short notes now: When splitting, you could split based on punctuation as well as whitespace, save the punctuation or whitespace, and re-introduce it when joining the output string. It's a bit more work, but it'll get the job done.)
精彩评论