How to find a checksum of the same checksum? (job-interview question)

2022-12-25 10:06 问答作者：

Devise a simple algorithm which creates a file which contains nothing but its own chec开发者_Go百科ksum.

Let's say it is CRC-32, so this file must be 4 bytes long.

There might be some smart mathematical way of finding it out (or proving that none exists), if you know how the algorithm works.

But since I'm lazy and CRC32 has only 2^32 values, I would brute force it. While waiting for the algorithm to go through all 2^32 values, I would use Google and Stack Overflow to find whether somebody has a solution to it.

In case of SHA-1, MD5 and other more-or-less cryptographically secure algorithms, I would get intimidated by the mathematicians who designed those algorithms and just give up.

EDIT 1: Brute forcing... This far I've found one; CC4FBB6A in big-endian encoding. There might still be more. I'm checking 4 different encodings: ASCII uppercase and lowercase, and binary big-endian and little-endian.

EDIT 2: Brute force done. Here are the results:

CC4FBB6A (big-endian)
FFFFFFFF (big-endian & little-endian)
32F3B737 (uppercase ASCII)

The code is here. On my overclocked C2Q6600 that takes about 1.5 hours to run. Now that program is single-threaded, but it would be easy to make it multi-threaded, which would give a nice linear scalability.

Aside from Jerry Coffin and Esko Luontola's good answers to an unusual problem, I'd like to add:

Mathematically, we're looking for X such that F(X) = X, where F is the checksum function, and X is the data itself. Since the checksum's output is of fixed size, and the input we are looking for is of the same size, there is no guarantee that such an X even exists! It could very well be that every input value of the fixed size is correlated with a different value of that size.

EDIT: Your question didn't specify the exact way the checksum is supposed to be formatted within the file, so I assumed you mean the byte-representation of the checksum. When strings and encodings and formatted-strings come to play, things become more complex.

Lacking any specific guidance to the contrary, I'd define the checksum of nonexistent data as a nonexistent checksum, so creating an empty file would fulfill the requirement.

Another typical method is a negative checksum -- i.e. after the data you write a value that makes the checksum of the whole file (including the checksum) come out to zero. In this case, you write a checksum of 0, and it all works out.

Brute force. This is Adler32, which I haven't implemented before, and didn't bother testing, so it's quite likely I've messed it up. I wouldn't expect a corrected version to run significantly slower, though, unless I've done something colossally wrong.

This assumes that the 32bit checksum value is written to the file little-endian (I didn't find a fixed point with it big-endian):

#include <iostream>
#include <stdint.h>
#include <iomanip>

const int modulus = 65521;

void checkAllAdlers(uint32_t sofar, int depth, uint32_t a, uint32_t b) {
    if (depth == 4) {
        if ((b << 16) + a == sofar) {
            std::cout << "Got a fixed point: 0x" << 
                std::hex << std::setw(8) << std::setfill('0') << 
                sofar << "\n";
        }
        return;
    }
    for (uint32_t i = 0; i < 256; ++i) {
        uint32_t newa = a + i;
        if (newa >= modulus) newa -= modulus;
        uint32_t newb = b + a;
        if (newb >= modulus) newb -= modulus;

        checkAllAdlers(sofar + (i << (depth*8)), depth + 1, newa, newb);
    }
    return;
}

int main() {
    checkAllAdlers(0, 0, 1, 0);
}

Output:

$ g++     adler32fp.cpp   -o adler32fp -O3 && time ./adler32fp
Got a fixed point: 0x03fb01fe

real    0m31.215s
user    0m30.326s
sys     0m0.015s

[Edit: several bugs fixed already, I have no confidence whatever in the correctness of this code ;-) Anyway, you get the idea: a 32 bit checksum which uses each byte of input only once is very cheap to brute force. Checksums are usually designed to be fast to compute, whereas hashes are usually much slower, even though they have superficially similar effects. If your checksum was "2 rounds of Adler32" (meaning that the target checksum was the result of computing the checksum and then computing the checksum of that checksum) then my recursive approach wouldn't help so much, there'd be proportionally less in common between inputs with a common prefix. MD5 has 4 rounds, SHA-512 has 80.]

Brute force it. CRC-32 gives you a string of length 8 containing digits and letters of A-F (in other words, it's a hexadecimal number). Try every combination, giving you 16⁸ = many possibilities. Then hash each possibility and see if it gives you the original string.

You can try optimizing it by assuming the solution will use each character no more than two or three times, this might make it finish faster.

If you have access to a CRC32 implementation, you can also try to break the algorithm and find a solution much faster, but I have no idea how you'd do this.

继续阅读：algorithm

How to find a checksum of the same checksum? (job-interview question)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？