How does the rsync algorithm correctly identify repeating blocks?

2022-12-25 13:44 问答作者：

I'm on a personal quest to learn how the rsync algorithm works. After some reading and thinking, I've come up with a situation where I think the algorithm fails. I'm trying to figure out how this is resolved in an actual implementation.

Consider this example, where A is the receiver and B is the sender.

A = abcde1234512345fghij
B = abcde12345fghij

As you can see, the only change is that 12345 has been removed.

Now, to make this example interesting, let's choose a block size of 5 bytes (chars). Hashing the values on the sender's side using the weak checksum gives the following values list.

abcde|12345|fghij

abcde -> 495
12345 -> 255
fghij -> 520

values = [495, 255, 520]

Next we check to see if any hash values differ in A. If there's a matching block we can skip to the end of that block for the next check. If there's a non-matching block then we've found a difference. I'll step through this process.

Hash the first block. Does this hash exist in the values list? abcde -> 495 (yes, so skip)
Hash the second block. Does this hash exist in the values list? 12345 -> 255 (yes, so skip)
Hash the third block. Does this hash exist in the values list? 12345 -> 255 (yes, so skip)
Hash the fourth block. Does this hash exist in the values list? fghij -> 520 (yes, so skip)
No more data, we're done.

Since every hash was found in the values list, we conclude that A and B are the same. Which, in my humble opinion, isn't true.

It seems to me this will happen whenever there is more than one block that share the same hash. I know I have skipped the step of calculating and checking the strong hash, but th开发者_JAVA百科at won't make a difference since the second and third blocks are exactly the same

What am I missing?

The rsync algorithm sends two checksums: one for each chunk, and a "rolling" checksum for the whole file. In your example, A will see a difference in the rolling checksum once it gets to the "doubled-up" block.

继续阅读：algorithm hash rsync

How does the rsync algorithm correctly identify repeating blocks?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？