开发者

Spanish text in .py files

This is the code

A = "Diga sí por cualquier número de otro cuidador.".encode("utf-8")

I get this error:

'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)

I tried numerous开发者_StackOverflow社区 encodings unsuccessfully.

Edit:

I already have this at the beginning

# -*- coding: utf-8 -*-

Changing to

A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")

doesn't help


Are you using Python 2?

In Python 2, that string literal is a bytestring. You're trying to encode it, but you can encode only a Unicode string, so Python will first try to decode the bytestring to a Unicode string using the default "ascii" encoding.

Unfortunately, your string contains non-ASCII characters, so it can't be decoded to Unicode.

The best solution is to use a Unicode string literal, like this:

A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")


Error message: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)

says that the 7th byte is 0xed. This is either the first byte of the UTF-8 sequence for some (maybe CJK) high-ordinal Unicode character (that's absolutely not consistent with the reported facts), or it's your i-acute encoded in Latin1 or cp1252. I'm betting on the cp1252.

If your file was encoded in UTF-8, the offending byte would be not 0xed but 0xc3:

Preliminaries:
>>> import unicodedata
>>> unicodedata.name(u'\xed')
'LATIN SMALL LETTER I WITH ACUTE'
>>> uc = u'Diga s\xed por'

What happens if file is encoded in UTF-8:
>>> infile = uc.encode('utf8')
>>> infile
'Diga s\xc3\xad por'
>>> infile.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)
#### NOT the message reported in the question ####

What happens if file is encoded in cp1252 or latin1 or similar:
>>> infile = uc.encode('cp1252')
>>> infile
'Diga s\xed por'
>>> infile.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
#### As reported in the question ####

Having # -*- coding: utf-8 -*- at the start of your code does not magically ensure that your file is encoded in UTF-8 -- that's up to you and your text editor.

Actions:

  1. save your file as UTF-8.
  2. As suggested by others, you need u'blah blah'


put on first line of your code this:

# -*- coding: utf-8 -*-


You should specify your source file's encoding by adding the following line to the very beginning of your code (assuming that your file is encoded in UTF-8):

# Encoding: UTF-8

Otherwise, Python will assume an ASCII encoding and fail during parsing.


You probably operate on normal string, not unicode string:

>> type(u"zażółć gęślą jaźń")
-> <type 'unicode'>

>> type("zażółć gęślą jaźń")
-> <type 'str'>

so

u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")

should work.

If you want use unicode strings by default, put

# -*- coding: utf-8 -*-

in the first line of your script.

Look also in docs.

P.S. It's Polish in examples above :)


In the first or second line of your code, type the comment:

    # -*- coding: latin-1 -*-

For a list of symbols supported see: http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29

And the languages covered: http://en.wikipedia.org/wiki/ISO_8859-1


Maybe this is what you want to do:

A = 'Diga sí por cualquier número de otro cuidador'.decode('latin-1')

And don't forget to add # -*- coding: latin-1 -*- at the beginning of your code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜