How do I include unicode strings in Python doctests?

2022-12-11 05:19 问答作者：

I am working on some code that has to manipulate unicode strings. I am trying to write doctests for it, but am having trouble. The following is a minimal example that illustrates the problem:

# -*- coding: utf-8 -*-
def mylen(word):
开发者_高级运维  """
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

First we run the code to see the expected output of print mylen(u"áéíóú").

$ python mylen.py
5

Next, we run doctest on it to see the problem.

$ python -m
5
**********************************************************************
File "mylen.py", line 4, in mylen.mylen
Failed example:
    mylen(u"áéíóú")
Expected:
    5
Got:
    10
**********************************************************************
1 items had failures:
   1 of   1 in mylen.mylen
***Test Failed*** 1 failures.

How then can I test that mylen(u"áéíóú") evaluates to 5?

If you want unicode strings, you have to use unicode docstrings! Mind the u!

# -*- coding: utf-8 -*-
def mylen(word):
  u"""        <----- SEE 'u' HERE
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

This will work -- as long as the tests pass. For Python 2.x you need yet another hack to make verbose doctest mode work or get correct tracebacks when tests fail:

if __name__ == "__main__":
    import sys
    reload(sys)
    sys.setdefaultencoding("UTF-8")
    import doctest
    doctest.testmod()

NB! Only ever use setdefaultencoding for debug purposes. I'd accept it for doctest use, but not anywhere in your production code.

Python 2.6.6 doesn't understand unicode output very well, but this can be fixed using:

already described hack with sys.setdefaultencoding("UTF-8")
unicode docstring (already mentioned above too, thanks a lot)
AND print statement.

In my case this docstring tells that test is broken:

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cm² sec)'.

    >>> beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    u'erg/(cm² sec)'
    '''

with "error" message

Failed example:
    beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
Expected:
    u'erg/(cm² sec)'
Got:
    u'erg/(cm\xb2 sec)'

Using print we can fix that:

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cm² sec)'.

    >>> print beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    erg/(cm² sec)
    '''

This appears to be a known and as yet unresolved issue in Python. See open issues here and here.

Not surprisingly, it can be modified to work OK in Python 3 since all strings are Unicode there:

def mylen(word):
  """
  >>> mylen("áéíóú")
  5
  """
  return len(word)

print(mylen("áéíóú"))

My solution was to escape the unicode characters, like u'\xe1\xe9\xed\xf3\xfa'. Wasn't as easy to read though, but my tests only had a few non-ASCII characters so in those cases I put the description to the side as a comment, like "# n with tilde".

As already mentioned, you need to ensure your docstrings are Unicode.

If you can switch to Python 3, then it would work automatically there, as both the source encoding is already utf-8 and the default string type is Unicode.

To achieve the same in Python 2, you need to keep the coding: utf-8 next to which you can either prefix all docstrings with u, or simply add

from __future__ import unicode_literals

继续阅读：doctest python unicode

How do I include unicode strings in Python doctests?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？