开发者

small issue with whitespeace/punctuation in python?

I have this function that will convert text language into English:

def translate(string):
    textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go',
        'lol': 'laugh out    loud', 'ur': 'your',}
    translatestring = ''
    for word in string.split(' '):
        if word in textDict:
            translatestring = translatestring + textDict[word]
        else:
            translatestring = translatestring + word
    return translatestring

However, if I want to translate y u l8? it will return wh开发者_C百科yyoul8?. How would I go about separating the words when I return them, and how do I handle punctuation? Any help appreciated!


oneliner comprehension:

''.join(textDict.get(word, word) for word in re.findall('\w+|\W+', string))

[Edit] Fixed regex.


You're adding words to a string without spaces. If you're going to do things this way (instead of the way suggested to your in your previous question on this topic), you'll need to manually re-add the spaces since you split on them.


"y u l8" split on " ", gives ["y", "u", "l8"]. After substitution, you get ["why", "you", "late"] - and you're concatenating these without adding spaces, so you get "whyyoulate". Both forks of the if should be inserting a space.


You can just add a + ' ' + to add a space. However, I think what you're trying to do is this:

import re

def translate_string(str):
    textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go',  'lol': 'laugh out loud', 'ur': 'your',}
    translatestring = ''
    for word in re.split('([^\w])*', str):
        if word in textDict:
            translatestring += textDict[word]
        else:
            translatestring += word

    return translatestring


print translate_string('y u l8?')

This will print:

why you late?

This code handles stuff like question marks a bit more gracefully and preserves spaces and other characters from your input string, while retaining your original intent.


I'd like to suggest the following replacement for this loop:

for word in string.split(' '):
    if word in textDict:
        translatestring = translatestring + textDict[word]
    else:
        translatestring = translatestring + word

for word in string.split(' '): translatetring += textDict.get(word, word)

The dict.get(foo, default) will look up foo in the dictionary and use default if foo isn't already defined.

(Time to run, short notes now: When splitting, you could split based on punctuation as well as whitespace, save the punctuation or whitespace, and re-introduce it when joining the output string. It's a bit more work, but it'll get the job done.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜