Sequential UID set generation for MySQL Char() or other Field

2023-01-25 15:06 问答作者：

Tried Googling but:

Question: Best way to externally generate Sequential UID values for a MySQL field which must be representable as a string.

Reason:

Generic sequential UUID-ish values for on-disk-order/page-appending inserts for performance of writes and date prefixing for read speed when searching an index of the field from char[0] forward. The column will be indexed, but looking for the best data to inc开发者_JAVA百科rease index read and table write performance rather than a plain-old-UUID.

My initial thought is date to some granularity (possibly padded epoch) appended to or replacing some portion of a UUIDv4 generated string ie [Unix epoch][remaining UUID4] in a fixed-width char field, but I am unsure if this would have the desired in-page/disk ordering result and index-searching result. An example would be:

12904645950049bceba1cc24e80806dd

The values must be independent of MySQL itself, hence using UUIDs and timestamps rather than some variation of auto-incrementing.

Anyone who knows the internals of MySQL indexes have any suggestions (for InnoDB Tables) ?

Aiden

Might be a bit offtopic, but have a look at Twitter's snowflake. They say it's:

(Roughly) Time Ordered (helps a lot to avoid expensive random primary key BTREE updates)
Directly Sortable
Compact

Not to mention other features (HA, etc.). You can either nick their algorithm or just use it as it stands.

The whole UID only uses up to 64 bits of space so I would guess it would be quite effective to index - see http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/ (a counter example).

I think you may need to be more specific with what you are trying to solve (what's the actual problem - why not auto_increment?, what is your proposed schema?, etc.). To answer your internals question:

InnoDB stores data in an index (the clustered index), in 16K pages.

The risks of not inserting sequentially are at least two fold:

If you do not have memory fit, you may need to do random IO to load a page from disk to insert the value to that page.
There might not be space remaining in the page (InnoDB fills 93% and leaves a small gap for updates), which could result in the page needing to be split. More split pages = fragmentation / less optimal use of things such as memory.

So, I think as long as you are approximately sequential at least (1) isn't a concern for the primary key index (could still be true for any unique indexes). You just need to be worried about (2).

Why I said that understanding the problem is important, is that there is so many ways to do this besides long GUIDs. For one, a BIGINT in MySQL is smaller than any data type you will probably be using, but has a range of 18 quintillion. You could allocate "chunks" of key space N thousand at a time to worker nodes and guarantee no duplicates. If a worker node crashes and doesn't use all the chunk it was allocated, so what. It doesn't matter.

Check out this question. It perhaps doesn't detail the specific uses of MySQL indices, but it does give you some performance data, and the code to generate the Seq. UIDs.

It seems MySQL indexing benefits greatly from sequential IDs, and according to MySQL the indexing relies on disk-ordering (see Section: B-Tree Index Characteristics) to find the relevant results.

From memory, MySQL indexing (for String indices at least) relies first on the alphanumeric-ordering of the field, i.e. "Oh, it begins with an A? I have data that begins with an A, I'll fetch it for you... etc." Rather than doing a full-text scan on each field.

And entering the UIDs in sequentially means the index does not reorder the results 'alphabetically' first, or at least reduces this time dramatically, hence the above performance benefits mentioned above.

(Not really a solution, but an answer at least.)

What I do is I use a fixed width character field and perpend a random UUID string to the current time (in milliseconds). This is nice because even if your server is accessed twice in the same millisecond it will still (likely) be unique. I assume if you have a massive server load this could give multiple id's but if this is worried about you could check to see if a row with this uuid has already been created.

PHP:

$date = new DateTime();
$UUID = uniqid( $date->format('Uu'), FALSE);  // For less length
$UUID = uniqid( $date->format('Uu'), TRUE);   // For more length

This is what I use on my (rarely used) server. But it should hold strong for bigger loads. As I said to overcome the slight chance that two identical keys are created check to see if it has already been used and assign a new one. (this shouldn't happen too often)

继续阅读：indexing

Sequential UID set generation for MySQL Char() or other Field

PHP:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

PHP:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？