开发者

How to print out encoded Asian characters(gb2312) on command prompt?

I am working for a company that uses the Python programming language version 3.1 as a causal work now. And I've encountered this problem: how to print out some encoded Asian characters(Chinese, Japanese, Korean) on command prompt?

Done a bit research and tried, but got no luck:

import sys
import codecs
print(sys.getdefaultencoding()) # prints out UTF-8
fileObj = codecs.open("test.txt", "r", "eucgb2312_cn")
content = fileObj.read()
print(content)

It is the last line that would cause this err开发者_如何转开发or:

C:\Documents and Settings\Michael Mao\Desktop>test.py
utf-8
Traceback (most recent call last):
  File "C:\Documents and Settings\Michael Mao\Desktop\test.py", line 6, in <module>
    print(u)
  File "C:\tools\Python31\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u5377' in position 3: character maps to < undefined >

I cannot change the default encoding from UTF-8 to anything else, so I reckon that is the problem preventing the output from being rendered correctly.

Can anyone help me out in this? Thanks a lot in advance!


I have solved this problem. When I am programming a dict, I encounter this problem.

#coding=utf-8
import codecs
import sys
# import imp
# imp.reload(sys) 
# sys.setdefaultencoding('utf8')
dictFileName = 'abstract.dict'
print(sys.getdefaultencoding())  
print(sys.stdout.encoding)

def readDict():
    print("start reading dict...")
    #dictObject = codecs.open(dictFileName,'rb', encoding = 'utf-8')#, encoding = 'utf-8')
    dictObject = open(dictFileName, 'rb')
    try:
        print('open file success!')
        #dictObject.seek(0x1852c)
        chunk = dictObject.read(0x5f0) #0x5f0
        print(len(chunk))
        #chunk = dictObject.read(0x1)
        print('read success')
        #print(chunk.decode("utf-8"))
        #print(chunk.encode('utf-8').decode('gb18030'))
        #sys.stdout.buffer.write(chunk.encode('gb18030'))
        sys.stdout.buffer.write(chunk.decode('utf-8').encode('gb18030'))
    finally:
        dictObject.close()
readDict()
input()


I cannot change the default encoding from UTF-8 to anything else

I don't think UTF-8 is being used as the default encoding for your console:

File "C:\tools\Python31\lib\encodings\cp437.py"

cp437 is the old DOS terminal code page, which indeed cannot print chinese characters.

See bug 1602 for a batch file hack to make Windows and Python 3 use UTF-8 (code page 65001) for the console, but in general the console has always been pretty broken for non-ASCII characters, and will continue to be so until someone changes Python to use WriteConsoleW instead of the standard C IO functions.


If you open the cmd window yourself, type the following command before running test.py: mode con cp select=936

If your Python program starts by some other means, you'll have to make it open its console window with the correct code page.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜