开发者

Why python IDLE and Console produce different results

I write a simple Python script to translate Chinese Punctuation to English.

import codecs, sys

def trcn():
    tr = lambda x: x.translate(str.maketrans(""",。!?;:、()【】『』「」﹁﹂“”‘’《》~¥…—×""", """,.!?;:,()[][][][]""''<>~$^-*"""))
    out = codecs.getwriter('utf-8')(sys.stdout)
    for line in sys.stdin:
        out.write(tr(line))

if __name__ == '__main__':
    if not len(sys.argv) == 1:
        print("usage:\n\t{0} STDIN STDOUT".format(sys.argv[0]))
        sys.exit(-1)
    trcn()
    sys.exit(0)

But something is wrong with UNICODE. I cannot get it passed. Error msg:

Traceback (most recent call last):
  File "trcn.py", line 13, in <module>
    trcn()
  File "trcn.py", line 7, in trcn
    out.write(tr(line))
  File "C:\Python31\Lib\codecs.py", line 356, in write
    self.stream.write(data)
TypeError: must be str, not bytes

After then, I test the out.write() in IDLE and Console. They produced different results. I don't know why.

In IDLE

Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys,codecs
>>> out = codecs.getwriter('utf-8')(sys.stdout)
>>> out.write('hello')
hello
>>>

In Console

Python 3.1.2 (r312:79149, Mar 21 2010, 00开发者_C百科:41:52) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys,codecs
>>> out = codecs.getwriter('utf-8')(sys.stdout)
>>> out.write('hello')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python31\Lib\codecs.py", line 356, in write
    self.stream.write(data)
TypeError: must be str, not bytes
>>>

Platform: Windows XP EN


Your encoded output is coming out of the encoder as bytes, and therefore must be passed to sys.stdout.buffer:

out = codecs.getwriter('utf-8')(sys.stdout.buffer)

I'm not entirely sure why your code acts differently in IDLE versus the console, but the above may help. Perhaps IDLE's sys.stdout actually expects bytes instead of characters (hopefully it has a .buffer that also expects bytes).


IDLE redirects the stdout to its own GUI output. It apparently accepts bytes as well as strings, which normal stdout doesn't.

Either decode it to Unicode, or print it to sys.stdout.buffer.


It is very well obvious that the console's encoding is not utf-8. there is a way to specify the encoding as optional parameter when invoking python in console. just look for it in python docs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜