Spanish text in .py files
This is the code
A = "Diga sí por cualquier número de otro cuidador.".encode("utf-8")
I get this error:
'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
I tried numerous开发者_StackOverflow社区 encodings unsuccessfully.
Edit:
I already have this at the beginning
# -*- coding: utf-8 -*-
Changing to
A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
doesn't help
Are you using Python 2?
In Python 2, that string literal is a bytestring. You're trying to encode it, but you can encode only a Unicode string, so Python will first try to decode the bytestring to a Unicode string using the default "ascii" encoding.
Unfortunately, your string contains non-ASCII characters, so it can't be decoded to Unicode.
The best solution is to use a Unicode string literal, like this:
A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
Error message: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
says that the 7th byte is 0xed
. This is either the first byte of the UTF-8 sequence for some (maybe CJK) high-ordinal Unicode character (that's absolutely not consistent with the reported facts), or it's your i-acute encoded in Latin1 or cp1252. I'm betting on the cp1252.
If your file was encoded in UTF-8, the offending byte would be not 0xed
but 0xc3
:
Preliminaries:
>>> import unicodedata
>>> unicodedata.name(u'\xed')
'LATIN SMALL LETTER I WITH ACUTE'
>>> uc = u'Diga s\xed por'
What happens if file is encoded in UTF-8:
>>> infile = uc.encode('utf8')
>>> infile
'Diga s\xc3\xad por'
>>> infile.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)
#### NOT the message reported in the question ####
What happens if file is encoded in cp1252 or latin1 or similar:
>>> infile = uc.encode('cp1252')
>>> infile
'Diga s\xed por'
>>> infile.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
#### As reported in the question ####
Having # -*- coding: utf-8 -*-
at the start of your code does not magically ensure that your file is encoded in UTF-8 -- that's up to you and your text editor.
Actions:
- save your file as UTF-8.
- As suggested by others, you need u'blah blah'
put on first line of your code this:
# -*- coding: utf-8 -*-
You should specify your source file's encoding by adding the following line to the very beginning of your code (assuming that your file is encoded in UTF-8):
# Encoding: UTF-8
Otherwise, Python will assume an ASCII encoding and fail during parsing.
You probably operate on normal string, not unicode string:
>> type(u"zażółć gęślą jaźń")
-> <type 'unicode'>
>> type("zażółć gęślą jaźń")
-> <type 'str'>
so
u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
should work.
If you want use unicode strings by default, put
# -*- coding: utf-8 -*-
in the first line of your script.
Look also in docs.
P.S. It's Polish in examples above :)
In the first or second line of your code, type the comment:
# -*- coding: latin-1 -*-
For a list of symbols supported see: http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
And the languages covered: http://en.wikipedia.org/wiki/ISO_8859-1
Maybe this is what you want to do:
A = 'Diga sí por cualquier número de otro cuidador'.decode('latin-1')
And don't forget to add # -*- coding: latin-1 -*-
at the beginning of your code.
精彩评论