Testing for UnicodeDecodeError in Python 3

2023-03-23 21:39 问答作者：

I have the following test for a function that can only accept unicode text in Python 2.x

def testNonUnicodeInput(self):
        """ Test falure on non-unicode input. """
        input = "foo".encode('utf-16')
        self.assertRaises(UnicodeDecodeError, myfunction, input)

However, that test fails when run in Python 3.x. I get:

AssertionError: UnicodeDecodeError not raised by myfunction

I'm trying to figure out how to set up a test that will continue to work in Python 2.x, but will also work after being run through 2to3 on Python 3.x.

I should probably note that I'm doing the following in my function to force unicode:

def myfunction(input):
    """ myfunction only accepts unicode input. """
    ...
    try:
        source = unicode(source)
    except UnicodeDecodeError, e:
        # Customise error message while maintaining original trackback
        e.reason += '. -- Note: Myfunction only accepts unicode input!'
        raise
    ...

Of course, that (along with the test) is being run through 2to3 before being run on Python 3.x. I suppose what I actually want on Python 3 is to not accept byte strings which I tho开发者_如何学Cugh I was doing by encoding the string first. I didn't use 'utf-8' as the encoding because I understand that that is the default.

Anyone have any ideas for consistency here?

You shouldn't have to do anything to Python 3 strings; they're all Unicode. Just test isinstance(s, str). Or, if the problem is the other way around, you'd want to use bytes.decode().

Okay, a way to cause UnicodeDecodeError in both Python 3 and Python 2:

Python 3:

>>> "foo".encode('utf-16').decode('utf-8')
Traceback (most recent call last):
  File "<pyshell#61>", line 1, in <module>
"foo".encode('utf-16').decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

Python 2:

>>> "foo".encode('utf-16').decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

Not sure if 2to3 would automatically convert the string literal to the b"foo" syntax, though. If it does, you'd just have to take out the b manually, or set it to ignore that somehow.

Well, I've decided to just skip the test under Python 3 for now.

if sys.version_info < (3, 0):
    input = "foo".encode('utf-16')
    self.assertRaises(UnicodeDecodeError, myfunction, input

However, if someone could suggest a test that would pass under Python 2 & 3, I'm open to suggestions.

继续阅读：python python-3.x unicode unit-testing

Testing for UnicodeDecodeError in Python 3

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？