Trouble porting a function in Python to C using the Python C API

2023-02-04 17:06 问答作者：

I have a checksum function in Python:

def checksum(data):
    a = b = 0
    l = len(data)
    for i in range(l):
        a += ord(data[i])
        b += (l - i)*ord(data[i])

    return (b << 16) | a, a, b

that I am trying to port to a C module for speed. Here's the C function:

static PyObject *
checksum(PyObject *self, PyObject *args)
{
    int i, length;
    unsigned long long a = 0, b = 0;
    unsigned long long checksum = 0;
    char *data;

    if (!PyArg_ParseTuple(args, "s#", &data, &length)) {
        return NULL;
    }

    for (i = 0; i < length; i++) {
        a += (int)data[i];
        b += (length - i) * (int)data[i];
    }

    checksum = (b << 16) | a;
    return Py_BuildValue("(Kii)", checksum, (int)a, (int)b);
}

I use it by opening a file and feeding it a 4096 block of data. They both return the same values for small strings, but when I feed it binary data straight from a file, the C version returns wildly different values. Any help开发者_运维百科 would be appreciated.

I would guess that you have some kind of overflow in your local variables. Probably b gets to large. Just dump the values for debugging purposes and you should see if it's the problem. As you mention, that you are porting the method for performance reasons. Have you checked psyco? Might be fast enough and much easier. There are more other tools which compile parts of python code on the fly to C, but I don't have the names in my head.

I'd suggest that the original checksum function is "incorrect". The value returned for checksum is of unlimited size (for any given size in MB, you could construct an input for which the checksum will be at least of this size). If my calculations are correct, the value can fit in 64 bits for inputs of less than 260 MB, and b can fit in an integer for anything less than 4096 bytes. Now, I might be off with the number, but it means that for larger inputs the two functions are guaranteed to work differently.

To translate the first function to C, you'd need to keep b and c in Python integers, and to perform the last calculation as a Python expression. This can be improved, though:

You could use C long long variables to store an intermediate sum and add it to the Python integers after a certain number of iterations. If the number of iterations is n, the maximum value for a is n * 255, and for b is len(data) * n * 255. Try to keep those under 2**63-1 when storing them in C long long variables.
You can use long long instead of unsigned long long, and raise a RuntimeError every time it gets negative in debug mode.

Another solution would be to limit the Python equivalent to 64 bits by using a & 0xffffffffffffffff and b & 0xffffffffffffffff.

The best solution would be to use another kind of checksum, like binascii.crc32.

继续阅读：algorithm checksum python

Trouble porting a function in Python to C using the Python C API

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？