Performance of Long IDs

2022-12-18 12:26 问答作者：

I've been wondering about this for some time. In CouchDB we have some fairly log IDs...eg:

"000ab56cb24aef9b817ac98d55695c6a"

Now if we're searching for this item and going through the tree structure created by the view. It seems a simple integer as an id would be much fast开发者_C百科er. If we used 64bit integers it would be a simple CMP followed by a JMP (assuming that the Erlang code was using JIT, but you get my point).

For strings, I assume we generate a hash off the ID or something, but at some point we have to do a character compare on all 33 characters...won't that affect performance?

The short answer is, yes, of course it will affect performance, because the key length will directly impact the time it takes to walk down the tree.

It also affects storage, as longer keys take more space, space takes time.

However, the nuance you are missing is that while Couch CAN (and does) allocated new IDs for you, it is not required to. It will be more than happy to accept your own IDs rather than generate it's own. So, if the key length bothers you, you are free to use shorter keys.

However, given the "json" nature of couch, it's pretty much a "text" based database. There's isn't a lot of binary data stored in a normal Couch instance (attachments not withstanding, but even those I think are stored in BASE64, I may be wrong).

So, while, yes an 64-bit would be the most efficient, the simple fact is that Couch is designed to work for any key, and "any key" is most readily expressed in text.

Finally, truth be told, the cost of the key compare is dwarfed by the disk I/O fetch times, and the JSON marshaling of data (especially on writes). Any real gain achieved by converting to such a system would likely have no "real world" impact on overall performance.

If you want to really speed up the Couch key system, code the key routine to block the key in to 64Bit longs, and comapre those (like you said). 8 bytes of text is the same as a 64 bit "long int". That would give you, in theory, an 8x performance boost on key compares. Whether erlang can create such code, I can't say.

From the CouchDB: The definitive guide book:

I need to draw a picture of this at some point, but the reason is if you think of the idealized btree, when you use UUID’s you might be hitting any number of root nodes in that tree, so with the append only nature you have to write each of those nodes and everything above it in the tree. but if you use monotonically increasing id’s then you’re invalidating the same path down the right hand side of the tree thus minimizing the number of nodes that need to be rewritten. would be just the same for monotonically decreasing as well. and it should technically work if you’re updates can be guaranteed to hit one or two nodes in the inside of the tree, though that’s much harder to prove.

So sequential IDs offer a performance benefit, however, you must remember this isn't maintainable when you have more than one database, as the IDs will collide.

Performance of Long IDs

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？