开发者

Are PyArg_ParseTuple() "s" format specifiers useful in Python 3.x C API?

I'm trying to write a Python C extension that processes byte strings, and I have something basically working for Python 2.x and Python 3.x.

For the Python 2.x code, near the start of my function, I currently have a line:

    if (!PyArg_ParseTuple(args, "s#:in_bytes", &src_ptr, &src_len))
    ...

I notice that the s# format specifier accepts both Unicode strings and byte strings. I really just want it to accept byte strings and reject Unicode. For Python 2.x, this might be "good enough"--the standard hashlib seems to do the same, accepting Unicode as well as byte strings. However, Python 3.x is meant to clea开发者_JAVA技巧n up the Unicode/byte string mess and not let the two be interchangeable.

So, I'm surprised to find that in Python 3.x, the s format specifiers for PyArg_ParseTuple() still seem to accept Unicode and provide a "default encoded string version" of the Unicode. This seems to go against the principles of Python 3.x, making the s format specifiers unusable in practice. Is my analysis correct, or am I missing something?

Looking at the implementation for hashlib for Python 3.x (e.g. see md5module.c, function MD5_update() and its use of GET_BUFFER_VIEW_OR_ERROUT() macro) I see that it avoids the s format specifiers, and just takes a generic object (O specifier) and then does various explicit type checks using the GET_BUFFER_VIEW_OR_ERROUT() macro. Is this what we have to do?


I agree with you -- it's one of several spots where the C API migration of Python 3 was clearly not designed as carefully and thouroughly as the Python coder-visible parts. I do also agree that probably the best workaround for now is focusing on "buffer views", per that macro -- until and unless something better gets designed into a future Python C API (don't hold your breath waiting for that to happen, though;-).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜