开发者

Mysql – Detecting changes in data with a hash function over a part of table

I need generate a single hash over some data in a table

CREATE TABLE Table1
(
       F1             INT          UNSIGNED NOT NULL AUTO_INCREMENT,
       F2              INT          default     NULL,
       F3               Varchar(50)  default     NULL,
      ..
       FN              INT          default     NULL,
       PRIMARY KEY (F1)
);

i.e. F1, F3,FN where F2=10

SELECT md5(CONCAT_WS('#',F1,F3,FN)) FROM Tabe1 WHER开发者_JAVA百科E F2=10

Gives a Hash for each row in the table.

QUESTIONS

1) How do get a single hash over the whole table?

2) What is the fasts hashing algorithm to use MD5, SHA1, SHA or any other?

EDIT:

Mysql 4.1 is been used - and it does NOT have Trigger Support


1)

SELECT MD5( GROUP_CONCAT( CONCAT_WS('#',F1,F3,FN) SEPARATOR '##' ) ) FROM Table1

2) Speed doesn't really matters as a function has to run only once and all hash functions are fast enough


As for speed, you should try. It depends on the way the functions are implemented.

Chances are, however, that you will see very little speed differences. The hash functions you cite are all faster than what an average disk can spew out, so the question is not really "what hash function will make the code runs fastest ?" but "what hash function will make the CPU most idle while it waits for the data from the disk ?".

On my Intel Core2 Q6600, clocked at 2.4 GHz (64-bit mode), with my own C implementation of hash functions, I get the following hashing speeds:

  • MD5: 411 MB/s
  • SHA-1: 336 MB/s
  • SHA-256: 145 MB/s
  • SHA-512: 185 MB/s

That's using a single core only. My hard disks top at about 100 MB/s, so one can say that even with SHA-256, the hashing process will use no more than 17% of the machine CPU power. Of course, nothing guarantees that the implementation used by MySQL is that fast, which is why you should try. Also, in 32-bit mode, SHA-512 performance decreases quite a bit.

Cryptographically speaking, (grave) weaknesses have been found in MD5 and SHA-1, so if you work in a security-relevant setting (i.e. you want to detect changes even if there is someone who can choose some of the changes and would prefer that you do not detect said changes), you should stick to SHA-256 or SHA-512, which, as far as we know, are robust enough. MD5 and SHA-1 are still fine in non-security situations, though.


I would use a MySQL Trigger to detect changes on insert, delete, update, etc.


Altough this thread is old, maybe this is what you need: http://dev.mysql.com/doc/refman/5.0/en/checksum-table.html


See BIT_XOR: http://dev.mysql.com/doc/refman/5.6/en/group-by-functions.html "Returns the bitwise XOR of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision. This function returns 0 if there were no matching rows." For an example of usage, check pt-table-sync.


If by any reason you can't use Triggers, a different approach is to use the CONCAT option, like:

SELECT MD5( GROUP_CONCAT( CONCAT_WS('',F1,F3,FN) SEPARATOR ',' ) ) FROM Table1;

But be aware that if the table has allot of data the query will be slow! if possible try to exclude unnecessary columns from the CONCACT.

Also take note that by default MySQL Max CONCACT is 1024, there maybe the need to change this by running first the following query:

SET group_concact_max_len = 18446744073709547520;

Note that 18446744073709547520 is the maximum value, you could use a different one!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜