Reading Text with Accent - Python

2023-01-16 02:04 问答作者：

I did some script in python that connects to GMAIL and print a email text... But, often my emails has words with "accent". And there is my problem...

For example a text that I got: "PLANO DE S=C3=9ADE" should be printed as "PLANO DE SAÚDE".

How can I turn legible my email text? What can I use to convert theses letters with accent?

Thanks,

The code suggested by Andrey, works fine on windows, but on Linux I still getting the wrong print:

>>> b = 'PLANO DE S=C3=9ADE'
>>> s = b.decode('quopri').decode('utf-8')
>>> print s
PLANO DE SÃDE

Rafael,

Thanks, you are correct about the word, it was misspelled. But the problem still the same here. Another example: CORRECT WORD: obersevação

>>> b = 'Observa=C3=A7=C3=B5es'
>>> s = b.decode('quopri').decode('utf-8')
>>> print s
ObservaÃ§Ãµes

I am using Debian with UTF-8 locale:

>>> :~$ locale
LANG=en_US.UTF-8

Andrey,

Thanks for your time. I agree with your explanation, but still with same problem here. Take look in my test:

   s='Observa=C3=A7=C3=B5es'
   s2= s.decode('quopri').decode('utf开发者_如何学Go-8')

   >>> print s

   Observa=C3=A7=C3=B5es

   >>> print s2

   ObservaÃ§Ãµes

   >>> import locale

   >>> ENCODING = locale.getpreferredencoding()

   >>> print s.encode(ENCODING)
   Observa=C3=A7=C3=B5es

   >>> print s2.encode(ENCODING)
   ObservaÃ§Ãµes

   >>> print ENCODING
   UTF-8

This encoding is called Quoted-printable. In your example, you have a string (Python's unicode) encoded in UTF-8 bytes (Python's str) encoded in quoted printable bytes. So the right way to get a string value is:

>>> b = 'PLANO DE S=C3=9ADE'
>>> s = b.decode('quopri').decode('utf-8')
>>> print s
PLANO DE SÚDE

Update: There might be some issues with the console conding though. s holds a fully correct Unicode string value (of Python type unicode). But when you use the print statement, the value must be converted to bytes (Python's str) in order to be written to OS file descriptor number 1 (the standard output pipe). So the print statement implementation checks your console encoding, then makes some guesses and prints the results. In fact, in Python 2 the results will be different for printing from the interactive shell, running your process non-interactively and running your process while redirecting the output to a file.

The best way to output encoded strings in Python 2 is not agreed upon. Two ways that make most sense are:

1) Use locale's encoding guess and manually encode strings.

import locale
ENCODING = locale.getpreferredencoding()

print s.encode(ENCODING)

2) Use an encoding option (command-line, hard-coded or whatever).

from getopt import getopt
ENCODING = 'UTF-8'
opts, args = getopt(sys.argv[1:], '', ['encoding='])
for opt, arg in opts:
    if opt == '--encoding':
        ENCODING = arg

print s.encode(ENCODING)

Update 2: If nothing helps and you still sure that your console encoding and font are set to UTF-8, then try this:

import sys, os
ENCODING = 'UTF-8'
stdout = os.fdopen(sys.stdout.fileno(), 'wb')
s = u'привет' # Don't forget to use a Unicode literal staring with u''
stdout.write(s.encode(ENCODING))

At this point you must see the Russian word привет in cyrillic character set in your console :)

If this is the case, then you should use this binary stdout instead of normal sys.stdout.

Your string is wrong, look:

'PLANO DE S=C3=9ADE' == 'PLANO DE S\xc3\x9aDE'

Where is the missing "A" in SAÚDE?

If you decode 'PLANO DE S=C3=9ADE' as a quoted-printable, you will get only 'PLANO DE SÚDE'.

Running this code here on linux (Ubuntu 9.10):

>>> b = 'PLANO DE S=C3=9ADE'
>>> s = b.decode('quopri').decode('utf-8')
>>> print s
PLANO DE SÚDE

继续阅读：diacritics python quoted-printable utf-8

Reading Text with Accent - Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？