开发者

Japanese in python function

I wrote a function in Python which is used to tell me whether the two words are similar or not.

Now I want to pass Japanese text in m开发者_高级运维y same function. It is giving error "not a ascii character." I tried using utf-8 encoding, but then it giving the same error

Non-ASCII character '\xe3' in file

Is there any way to do that? I cant generate the msg file for that since the 2 keyword will be not be constant.

Here goes the code

def filterKeyword(keyword, adText, filterType):
if (filterType == 'contains'):
    try :
        adtext = str.lower(adText)
        keyword = str.lower(keyword)
        if (adtext.find(keyword)!=-1):
            return '0'
    except:
        return '1'
if (filterType == 'exact'):
    var = cmp(str.lower(adText), str.lower(keyword))
    if(var == 0 ):
        return '0'

return '1'

I have used the following:

filterKeyword(unicode('ポケモン').encode("utf-8"), unicode('黄色のポケモン').encode("utf-8"), 'contains')

filterKeyword('ポケモン'.encode("utf-8"), '黄色のポケモン'.encode("utf-8"), 'contains')

Both of them are giving the error.


This worked for me:

# -*- coding: utf-8 -*-

def filterKeyword(keyword, adText, filterType):
    # same as yours

filterKeyword(u'ポケモン', u'黄色のポケモン', 'contains')


Please do not do this:

adtext = str.lower(adText)
keyword = str.lower(keyword)

Please do this:

adtext= adText.lower()
keyword = keyword.lower()

Please do not do this:

cmp(str.lower(adText), str.lower(keyword))

Please do this:

return adText.lower() == keyword.lower()

Please do not do this:

try:
    # something
except:
    # handler

Please provide a specific exception. A generic (superclass) like Exception is fine. There are some non-exception errors which you cannot meaningfully catch.

try:
    # something
except Exception:
    # handler

Also, it's really unlikely that catching an exception would return True.

Please do not do this:

return '1' 
return '0'

It's unlikely you want to return a character. It's more likely you want to return True or False.

return True
return False

Your code will work, if you do things properly.

>>> u'ポケモン'.lower() == u'黄色のポケモン'.lower()
False
>>> u'ポケモン'.lower() in  u'黄色のポケモン'.lower()
True


Don't use UTF-8. Use unicodes.


Put:

# -*- coding: utf-8 -*-

In one of the first two lines of your script. This way the interpreter will know what encoding is used for the code and strings in it.

And use Unicode strings wherever possible. If you have luck the function may work well with the Unicode (e.g. u"something…" instead of "something...") arguments even if it was not written with Unicode in mind.


I would just like to note well:

unicode('ポケモン') (a non-unicode string constant passed to the unicode() constructor)

IS NOT THE SAME AS

u'ポケモン' (a unicode string constant)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜