开发者

ipython and python are handling my string differently, why?

In python (2.7.1):

>>> x = u'$€%'
>>> x.find('%')
2
&g开发者_开发问答t;>> len(x)
3

Whereas in ipython:

>>> x = u'$€%'
>>> x.find('%')
4
>>> len(x)
5

What's going on here?


edit: including the additional info requested from the comments below

ipython

>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\xe2\x82\xac%'
>>> print x
$â¬%
>>> len(x)
5

python

>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\u20ac%'
>>> print x
$€%
>>> len(x)
3


@nye17 It's officially not a good idea to ever call setdefaultencoding() (it is removed from sys after first use for a reason). One common culprit is gtk, which causes all kinds of problems, so if IPython has imported gtk, sys.getdefaultencoding() will return utf8. IPython does not set the default encoding itself.

@wim can I ask what version of IPython you are using? Part of the major overhaul in 0.11 was fixing many unicode bugs, but more do crop up (mostly on Windows, now).

I ran your test case in IPython 0.11, and the behavior of IPython and Python do appear to be the same, so I think this bug is fixed.

Relevant values:

  • sys.stdin.encoding = utf8
  • sys.getdefaultencoding() = ascii
  • platforms tested: Ubuntu 10.04+Python2.6.5, OSX 10.7+Python2.7.1

As for an explanation, essentially IPython didn't recognize that input could be unicode. In IPython 0.10, the multibyte utf8 input is not being respected, so each byte = 1 character, which you can see with:

In [1]: x = '$€%'

In [2]: x
Out[2]: '$\xe2\x82\xac%'

In [3]: y = u'$€%'

In [4]: y
Out[4]: u'$\xe2\x82\xac%'# wrong!

Whereas, what should happen, and what does happen in 0.11, is that y == x.decode(sys.stdin.encoding), not repr(y) == 'u'+repr(x).


if you do

import sys
sys.getdefaultencoding()

I think you will get different results in python an ipython, possible one ascii, and the other one being utf-8, so it should only be a matter of which default encoding each one is choosing.

The other test you can do is to type the following to enfore it as your default locale,

import sys, locale
reload(sys)
sys.setdefaultencoding(locale.getdefaultlocale()[1])
sys.getdefaultencoding()

then try the test of x in your question.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜