Japanese in python function
I wrote a function in Python which is used to tell me whether the two words are similar or not.
Now I want to pass Japanese text in m开发者_高级运维y same function. It is giving error "not a ascii character." I tried using utf-8 encoding, but then it giving the same error
Non-ASCII character '\xe3' in file
Is there any way to do that? I cant generate the msg file for that since the 2 keyword will be not be constant.
Here goes the code
def filterKeyword(keyword, adText, filterType):
if (filterType == 'contains'):
try :
adtext = str.lower(adText)
keyword = str.lower(keyword)
if (adtext.find(keyword)!=-1):
return '0'
except:
return '1'
if (filterType == 'exact'):
var = cmp(str.lower(adText), str.lower(keyword))
if(var == 0 ):
return '0'
return '1'
I have used the following:
filterKeyword(unicode('ポケモン').encode("utf-8"), unicode('黄色のポケモン').encode("utf-8"), 'contains')
filterKeyword('ポケモン'.encode("utf-8"), '黄色のポケモン'.encode("utf-8"), 'contains')
Both of them are giving the error.
This worked for me:
# -*- coding: utf-8 -*-
def filterKeyword(keyword, adText, filterType):
# same as yours
filterKeyword(u'ポケモン', u'黄色のポケモン', 'contains')
Please do not do this:
adtext = str.lower(adText)
keyword = str.lower(keyword)
Please do this:
adtext= adText.lower()
keyword = keyword.lower()
Please do not do this:
cmp(str.lower(adText), str.lower(keyword))
Please do this:
return adText.lower() == keyword.lower()
Please do not do this:
try:
# something
except:
# handler
Please provide a specific exception. A generic (superclass) like Exception
is fine. There are some non-exception errors which you cannot meaningfully catch.
try:
# something
except Exception:
# handler
Also, it's really unlikely that catching an exception would return True.
Please do not do this:
return '1'
return '0'
It's unlikely you want to return a character. It's more likely you want to return True or False.
return True
return False
Your code will work, if you do things properly.
>>> u'ポケモン'.lower() == u'黄色のポケモン'.lower()
False
>>> u'ポケモン'.lower() in u'黄色のポケモン'.lower()
True
Don't use UTF-8. Use unicode
s.
Put:
# -*- coding: utf-8 -*-
In one of the first two lines of your script. This way the interpreter will know what encoding is used for the code and strings in it.
And use Unicode strings wherever possible. If you have luck the function may work well with the Unicode (e.g. u"something…"
instead of "something..."
) arguments even if it was not written with Unicode in mind.
I would just like to note well:
unicode('ポケモン')
(a non-unicode string constant passed to the unicode() constructor)
IS NOT THE SAME AS
u'ポケモン'
(a unicode string constant)
精彩评论