Mixing unicode and str in python 2.X ... problems?
mystr = 'aaaa'
myvar = u'My string %s' % str(mystr)
Can this be a problem in the future? I'm messing up woth some in-house code that uses email module开发者_运维技巧s in Python and found some code like this. mystr
will always have only ascii characters since it comes from a list with pre defined ascii only characters.
I didn't write the code, and having str(mystr)
or mystr
doesn't change the matter of the question.
Doing the first snippet I'm going to have a safe unicode object, or do I have to do
mystr = u'aaaa'
myvar = u'My string %s' % mystr
or
mystr = 'aaaa'
myvar = u'My string %s' % unicode(mystr)
?
(I know this is not the correct way of doing, I know I should handle the exceptions, I'm asking here only if the first snippet returns a valid unicode object, or if Python mess up with it's internals or something when doing it.)
Try putting actual unicode symbols in the strings (like umlauts or cyrillic) and watch hell breaking lose. :)
s = 'свят' # world
v = u'здравей %s' % s # hello %s
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)
The problem is that you will most likely code your application and on a bright shiny day some Russian or German will write her name and will suddenly get an Internal Server Error
for having a non-ascii symbol in her name.
I know... I'm asking about the situation in my example, using ascii only in
No, there will be no problem. And IMHO this is a fault in Python, because this is bug, waiting to bite. This should have been a fatal error, but because of historical reasons, I guess, it isn't.
As long as the regular 8-bit string contains only ASCII characters, you're fine. This can be done to save processing time and / or memory space if you really only need ASCII.
Can it be a problem in the future? Yes, if you're taking input possibly in a non-ASCII character set and saving it in a string. It's also just generally a good idea to be consistent -- don't use strings as storage for text anywhere if you need Unicode widely, unless there is a good reason otherwise.
精彩评论