How to algorithmically partition a keyspace?
This is related to consistent hashing and while I conceptually understand what I need to do, I'm having a hard time translating this into code.
I'm trying to divide a given keyspace (say, 128 bits) into equal sized partitions. I want the upper bound (highest key) of each partition.
Basically, how would I complete this?
#define KEYSPACE_BYTE_SIZE 16
#define KEYSPACE_BIT_SIZE (KEYSPACE_BYTE_SIZE * 8)
typedef struct _key
{
char byte[KEYSPACE_BYTE_SIZE];
} key;
key * partition_keyspace( int num_partitions )
{
key * partitions = malloc( sizeof(key) * num_partitions );
// ...
}
Edit:
I suppose another way of saying this is:
for (i = 0; i < num_partitions; i++)
{
partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}
Of course the problem is 2 ^ 128 is a very large number and can't be contained in any single integer variable in C with which to do the math (hence the char[16] struct).
I really don't want to use a large number library (o开发者_Go百科r any library) for this.
Edit:
Although, in actuality the numbers I'm looking for is:
for (i = 0; i < num_partitions; i++)
{
partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}
The highest key in any particular partition will obviously be comprised of all 1
-bits. If you have the lower n
bits for your keys, and the upper m
bits for your partition-ids, then all you need to do is run an m
-bit counter, and concatenate it with n
ones.
To illustrate, assume an 8-bit keyspace with the upper 2 bits for the partitions (so num_partitions = 2^2 = 4
, and the lower 6 for the keys. The highest key in each partition will be these four:
00 111111
01 111111
10 111111
11 111111
In order to generate them, all you need to do is:
for (int i = 0; i < num_partitions; i++)
highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.
Of course, this assumes num_partitions
is a power of two.
Naturally, for a key-space as large as yours it won't be as simple as the above, since you can't fit everything into a single variable. Still, the principle remains the same. As long as your num_partitions
is small enough, you can fit the counter into an ordinary int
variable, copy it into the upper bits, and then filling the rest with ones is trivial.
I am not sure I understand the context of your question - I've not studied consistent hashing.
The question almost amounts to, "how can I sort without sorting".
Another approach might be to do this:
iter = seed() #initialize to the bottom of the hash keys
for(i = 0 to partitionbound)
{
iter = nextIter(iter);
}
This is in linear time. However, it requires no a priori knowledge of the key space except that there is some order which nextIter obeys.
If you are partitioning [0, 2^128] -> {values}, e.g., you're doing some distributed computing or whathave you, you're in much better luck, since integers are well-structured.
I would suggest the slightly silly idea of having 4 32-bit ints in a struct and writing your own bigint routine that solves what you need to solve.
If you have the freedom to not use C++, Common Lisp has bigints built in. I've found that handy.
If you have representable keys...
However, when seeking some equally sized k partitions in some space a with n elements, I would approach the problem like this:
if( n % k)
{
return "not equal-sized partition!"
}
//could be forking/threading, whatever.
for(int i = 0; i < n; i+=k)
{
process(i, i+k-1);
}
process(bottom, top)
{
sort(a[bottom], a[top]);
return a[top]; //you'll have to figure out where to dump the results.
}
Based on tzaman's answer, here is my solution. It allows up to 255 partitions (although this could be altered). It does NOT require a power of 2 num_partitions... it'll just make the last partition take up whatever's left.
Let me know if you see any bugs... :)
key * partition_keyspace( unsigned int num_partitions )
{
assert( num_partitions > 0 );
assert( num_partitions < 0xFF );
key * partitions = (key *) malloc( sizeof(key) * num_partitions );
// fill every bit
memset( partitions, 0xFF, sizeof(key) * num_partitions );
// calculate how many bits of the top byte needs to be filled by 1's
unsigned char fill_bits = 0;
while (num_partitions > (1 << fill_bits)) fill_bits++;
fill_bits = 8 - fill_bits;
// fill the top byte with the base number of 1's
unsigned char fill_part = 0;
for (unsigned int i = 0; i < fill_bits; i++) fill_part |= 1 << i;
// last partition takes up whatever remains, so don't process it (hence the -1)
for (unsigned char i = 0; i < num_partitions - 1; i++)
{
partitions[i].byte[0] = fill_part | (i << fill_bits);
}
return partitions;
}
精彩评论