Best output type and encoding practices for repr() functions?

2023-01-14 19:40 问答作者：

Lately, I've had lots of trouble with __repr__(), format(), and encodings. Should the output of __repr__() be encoded or be a unicode string? Is there a best encoding for the result of __repr__() in Python? What I want to output does have non-ASCII characters.

I use Python 2.x, and want to write code that can easily be adapted to Python 3. The program thus uses

# -*- coding: utf-8 -*-
from __future__ import unicode_literals, print_function  # The 'Hello' literal represents a Unicode object

Here are some additional problems that have been bothering me, and I'm looking for a solution that solves them:

Printing to an UTF-8 terminal should work (I have sys.stdout.encoding set to UTF-8, but it would be best if other cases worked too).
Piping the output to a file (encoded in UTF-8开发者_运维知识库) should work (in this case, sys.stdout.encoding is None).
My code for many __repr__() functions currently has many return ….encode('utf-8'), and that's heavy. Is there anything robust and lighter?
In some cases, I even have ugly beasts like return ('<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8'), i.e., the representation of objects is decoded, put into a formatting string, and then re-encoded. I would like to avoid such convoluted transformations.

What would you recommend to do in order to write simple __repr__() functions that behave nicely with respect to these encoding questions?

In Python2, __repr__ (and __str__) must return a string object, not a unicode object. In Python3, the situation is reversed, __repr__ and __str__ must return unicode objects, not byte (née string) objects:

class Foo(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}' 

class Bar(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}'.encode('utf8')

repr(Bar())
# ☺
repr(Foo())
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

In Python2, you don't really have a choice. You have to pick an encoding for the return value of __repr__.

By the way, have you read the PrintFails wiki? It may not directly answer your other questions, but I did find it helpful in illuminating why certain errors occur.

When using from __future__ import unicode_literals,

'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')

can be more simply written as

str('<{}>').format(repr(x))

assuming str encodes to utf-8 on your system.

Without from __future__ import unicode_literals, the expression can be written as:

'<{}>'.format(repr(x))

I think a decorator can manage __repr__ incompatibilities in a sane way. Here's what i use:

from __future__ import unicode_literals, print_function
import sys

def force_encoded_string_output(func):

    if sys.version_info.major < 3:

        def _func(*args, **kwargs):
            return func(*args, **kwargs).encode(sys.stdout.encoding or 'utf-8')

        return _func

    else:
        return func


class MyDummyClass(object):

    @force_encoded_string_output
    def __repr__(self):
        return 'My Dummy Class! \N{WHITE SMILING FACE}'

I use a function like the following:

def stdout_encode(u, default='UTF8'):
    if sys.stdout.encoding:
        return u.encode(sys.stdout.encoding)
    return u.encode(default)

Then my __repr__ functions look like this:

def __repr__(self):
    return stdout_encode(u'<MyClass {0} {1}>'.format(self.abcd, self.efgh))

继续阅读：ascii encoding python repr

Best output type and encoding practices for repr() functions?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？