开发者

Can't input unicode in python IDE (Mac OS X)

I'm trying to collect some unicode raw_input in the default python IDE, and as far as I'm aware, it should be as simple as:

>>> c = raw_input()
日本語
>>> print c
日本語

However, when I try to input the unicode characters, the computer beeps some protestations and I end up with an empty string. (To do this, I click on the IME switcher near the time and select the appropriate input method [which in this case is Japanese input). Outside of the python IDE, the input works fine, I can input the characters and the system recognizes them as having been input. In the IDE, I'll type some hiragana, and the drop-down kanji selection window appears as usual, but when I select the appropriate representation and hit enter, those beeps come and I wind up with nothing. I figure there's a setting involved somewhere that I've missed.

versions are:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin

and

Python 2.5.4 (r254:67916, Jun 24 2010, 21:47:25) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin

neither of which work. There's also this:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.stdin.encoding
'UTF-8'
>>> sys.stdout.encoding
'UTF-8'
>>> sys.getfilesystemencoding()
'utf-8'

but from what I've read, the defaultencoding is a mysterious beast. Changing it doe开发者_Go百科sn't actually fix anything anyway. That is,

>>> import sys
>>> sys.setdefaultencoding('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'setdefaultencoding'
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>> # !!!
... c = raw_input()
no dice!

doesn't work. Just more beeping. I can't cut-and-paste Japanese text from other applications, either.


The defaultencoding shouldn't affect here. I had a similar problem and for me the solution was to check the Escape non-ASCII input option in Terminal > Preferences > Settings > Advanced. Also make sure that the Character encoding is set to Unicode (UTF-8) in the same settings page.


I've had the same problem. In my case it turned out to be a libedit problem. I fixed it by installing readline -- which I had to do from source (from here: http://pypi.python.org/pypi/readline) since using pip or easy_install, for whatever reason, didn't actually replace readline.

If you have ipython installed, it will tell you on startup if you're using libedit. And, if you have the same experience I did, you'll see the same problems in both the python interpreter in Terminal and in ipython. Once I got readline truly installed, and ipython no longer informed me that it was using libedit, the problems with entering Unicode disappeared in both python and ipython.

(Note: I also have bpython installed -- and, since it doesn't seem to use readline or libedit, but rather its own line-editing routines, entering Unicode in bpython always worked.)


Edit: I tried Python from the command line (Terminal), and it does not work, and I get the beeps you are talking about. It doesn't seem to be a Terminal limitation, as I can paste the characters at the $ prompt in bash just fine. It does work in Idle, as I show below.

Edit #2: Interestingly, this one-liner does work:

 $ python -c "exec(\"c=raw_input()\nprint c\")"
 日本語  <-- pasted
 日本語

I'd put this in a comment, but it wouldn't format correctly. Output from 2.6.5 on MacOSX:

Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "copyright", "credits" or "license()" for more information.

    ****************************************************************
    Personal firewall software may warn about the connection IDLE
    makes to its subprocess using this computer's internal loopback
    interface.  This connection is not visible on any external
    interface and no data is sent to or received from the Internet.
    ****************************************************************

IDLE 2.6.5      
>>> c=raw_input()
日本語
>>> print c
日本語
>>> c
u'\u65e5\u672c\u8a9e'
>>> 


Try this:

import codecs, sys
sys.stdin = codecs.getreader('UTF-8')(sys.stdin)
sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)
sys.stderr = codecs.getwriter('UTF-8')(sys.stderr)

print u'\u65e5\u672c\u8a9e'

This works for me for non-ASCII characters when using Putty with the terminal encoding set to UTF-8. I see boxes because I do not have fonts for CJK characters installed, but I think this should do it for you.

The reason this works is that by default the Python interpreter uses the 'ascii' codec for stdin, stdout and stderr. And because ASCII only defines byte values 0 to 127, only those byte values can be printed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜