small issue with whitespeace/punctuation in python?

2023-03-10 17:34 问答作者：

I have this function that will convert text language into English:

def translate(string):
    textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go',
        'lol': 'laugh out    loud', 'ur': 'your',}
    translatestring = ''
    for word in string.split(' '):
        if word in textDict:
            translatestring = translatestring + textDict[word]
        else:
            translatestring = translatestring + word
    return translatestring

However, if I want to translate y u l8? it will return wh开发者_C百科yyoul8?. How would I go about separating the words when I return them, and how do I handle punctuation? Any help appreciated!

oneliner comprehension:

''.join(textDict.get(word, word) for word in re.findall('\w+|\W+', string))

[Edit] Fixed regex.

You're adding words to a string without spaces. If you're going to do things this way (instead of the way suggested to your in your previous question on this topic), you'll need to manually re-add the spaces since you split on them.

"y u l8" split on " ", gives ["y", "u", "l8"]. After substitution, you get ["why", "you", "late"] - and you're concatenating these without adding spaces, so you get "whyyoulate". Both forks of the if should be inserting a space.

You can just add a + ' ' + to add a space. However, I think what you're trying to do is this:

import re

def translate_string(str):
    textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go',  'lol': 'laugh out loud', 'ur': 'your',}
    translatestring = ''
    for word in re.split('([^\w])*', str):
        if word in textDict:
            translatestring += textDict[word]
        else:
            translatestring += word

    return translatestring


print translate_string('y u l8?')

This will print:

why you late?

This code handles stuff like question marks a bit more gracefully and preserves spaces and other characters from your input string, while retaining your original intent.

I'd like to suggest the following replacement for this loop:

for word in string.split(' '):
    if word in textDict:
        translatestring = translatestring + textDict[word]
    else:
        translatestring = translatestring + word

for word in string.split(' '): translatetring += textDict.get(word, word)

The dict.get(foo, default) will look up foo in the dictionary and use default if foo isn't already defined.

(Time to run, short notes now: When splitting, you could split based on punctuation as well as whitespace, save the punctuation or whitespace, and re-introduce it when joining the output string. It's a bit more work, but it'll get the job done.)

继续阅读：punctuation python

small issue with whitespeace/punctuation in python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？