How best to reduce primary key value?

2023-04-01 17:46 问答作者：

I am developing an application (.Net) that supports Oracle, Sql Server and Informix as the data repositories. A problem with Informix is that one table (this is legacy stuff) has a primary key of 2048 characters, and Informix will not allow a PK of this width. So my initial solution is to get the app to derive an MD5 value from the key value and use that as the primary key when inserting or looking up data. Okay that works, but leaves me the immediate problem of 'upgrading' the data in existing databases, which for various reasons has to be done by means of a Sql script. Sadly Informix does not have a built-in MD5 function so I will be hard pressed to write a Sql script to create the new PK column an开发者_StackOverflow中文版d populate it from the existing data.

So my question is: can anyone suggest a better way of significantly compressing a long string value, that will avoid this problem?

Your approach is flawed as a PK must be definition be unique, and MD5 may generate collisions (duplicates).

Instead consider using a surrogate PK (e.g. identity or GUID).

can anyone suggest a better way of significantly compressing a long string value, that will avoid this problem

By definition you can't compress arbitrary strings and maintain uniqueness. Obviously if the strings have some structure that you know about, you could use this knowledge to create an application-specific compression algorithm.

In response to comments:

I also have a problem with surrogate keys, which bear no relation to the date being stored - bad database design

I know surrogate vs natural keys is a contentious subject, but surely your proposed MD5 hash is essentially a surrogate key? And in any case "all design is trade-off" so I wouldn't describe a database design as "bad" without some context. IMHO if there is no natural key shorter than 2048 characters, a surrogate key may well be a good option.

There are also performance tradeoffs to consider: with an MD5 or GUID surrogate PK, you have the potential for page splits as new rows will be inserted in the middle of the table vs at the end for an Identity PK.

By what definition?

The key word is 'arbitrary'. A non-lossy compression algorithm such as ZIP does not guarantee to achieve a given compression ratio on all inputs - think of attempting to ZIP a ZIP archive.

In Informix, if you create a dbspace with large page sizes (you'd need to use 12, 14 or 16 KiB pages), you can create indexes on keys up to about 3 KiB in that dbspace (rule of thumb, 5 key values must fit on one index page).

But a key that big is probably not very efficient, to be polite about it. I'd be curious to see the breakdown of the columns in the PK and why they have to be so big that they add up to 2 KiB. Can you not use a surrogate of some sort?

I think you can split key on two parts and store that parts in two columns, something like "id1", "id2". And then you can create composite primary key.

继续阅读：.net informix

How best to reduce primary key value?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？