How to enforce uniqueness for big (BLOB) database field
I'm designing a database (SQLite, SQL Server and DB2), where a table holds a 32kB blob that must be unique. The table will typically hold about 20,000 rows.
I can think of two solutions,
1 - Make the blob a unique index.
2 - Calculate a hash index of the blob, use that as a non unique index, and write code that enforces the blob's uniqueness.
Solution 1 is safer, 开发者_Go百科but is the storage space overhead and performance penalty bad enough to make solution 2 a better choice?
I would go with #2, partly as a space-saving measure, but more because some DBMS's don't allow indexes on LOBs (Oracle comes to mind, but that may be an old restriction).
I would probably create two columns to for hash values, MD5 and SHA1 (both commonly supported in client languages). Then add a unique composite index that covers those two columns. The likelihood of a collision on both hashes is infinitesimally small, particularly given your anticipated table sizes. However, you should still have a recovery strategy (which could be as simple as setting one of the values to 0).
精彩评论