Generating large numbers of unique random codes
I've got a project where we'll need to generate a lot of fixed-length random codes (Read: Millions) from a set of characters (EG: 12 digit alpha-numeric or 9 digit alpha-numeric lower-case only without the l character). We're going to then store these codes in an MSSQL database (SQL Server 2008). The language we're using is C#.
We also need to be able to generate more codes and add them to an existing set of codes with them being unique against themselves and the existing codes. The quantity of random codes generated will likely vary from millions down to merely hundreds.
The two obvious approaches that come to mind is either to gen开发者_如何学Cerate codes and just throw them at the database catching unique constraint exceptions or to pull the data down locally into a hash table then calculate all the new codes locally and put them into the database once generated.
Does anyone have any idea which of the above solutions would be more optimal or even better another solution that's more efficient that I haven't thought of?
Clarifications
The codes generated have to be non-predictable and there'll be multiple batches, each with uniqueness within themselves (EG: We'd have code set A with 100000 unique codes, code set B with 100000 unique codes, but there'd be no restriction that A intersect B is empty). They also have to be easy for a human to use (Hence the short length and potentially restricted character sets to avoid ambiguous characters).
The codes will be sent to users via various methods (Email, SMS, printed on paper, etc) and are used in a 1-use manner later (So if someone guesses someone else's code it'd be bad).
Have you considered using GUIDs (uniqueidentifiers in SQL Server)? They are unique and mostly random. You can generate them on either the client-side or on the server.
You might also think about using a CLR function on the SQL side, to help minimize the number of DB round-trips.
To ensure uniqueness, one approach is to append a unique, non-random number (such as the value of an identity column) to your random numbers. The result isn't random at the bit-by-bit level, but it is random when taken as a whole.
Generating millions of unique random numbers won't take long. Inserting them into the DB will be the slow part....
It really depends on the specific problem requirements. Do the codes have to be merely unique or also unpredictable? If they just have to be unique, then you can use a linear congruential random number generator to create your codes.
Wikipedia Page on Linear Congruential Generators
Here's some sample code:
class CodeGenerator
{
public long Seed
{
get { return _value; }
set { _value = value; }
}
private char[] alphabet =
{
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w'
};
public String GetCode()
{
// Generate the next value in the psuedo-random sequence.
_value = (362881L * _value + 76552897L) & 0xFFFFFFFFFFFL;
// Create the code. Add 2^44 to avoid small codes.
long code = _value + (1L << 44);
StringBuilder builder = new StringBuilder("123456789");
// The codes are all less than 2^45, so we have 45 bits of
// information and need 9 digits.
for (int i = 8; i >= 0; i--)
{
builder[i] = alphabet[code & 0x1F];
code = code >> 5;
}
return builder.ToString();
}
private long _value = 0;
}
The class will generate a sequence of 2^44 codes before repeating (over 17 trillion codes). To resume the sequence, simply record the current Seed value and restore it when you need more codes.
Generate all of them? In your first case you have a total of 35 characters per a position. Total storage is (base^positions) - 1 so your total number of combinations on the low end is 36^9 - 1 or 101,559,956,668,415 which is nearly a TB if the codes are one byte long...which they are not. And that's at the low end.
The best system is to pre-generate batches of valid numbers and feed these into the insertions. If the method of generation is semi-random then You can do this easily by dividing the random spaces using segments of the bit array. But you don't mention how random is random.
Of course if you have complete control over the randomness then you could just use UUIDs, which is what we do.
For generating highly unpredictable random values, may I suggest using the System.Security.Cryptography.RNGCryptoServiceProvider class.
Sample code for generating abitrary length strings of random characters from a pre-defined set shown below. This has been used in a password generator.
private string GetRandomAlphanumericCharacters(int length)
{
// Note: i, o, l, 0, and 1 have been removed to reduce
// chances of user typos and mis-communication of passwords.
char[] allowedCharacters = { 'a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H', /*'i', 'I',*/ 'j', 'J', 'k', 'K', /*'l', 'L',*/ 'm', 'M', 'n', 'N', /*'o', 'O',*/ 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z', /*'0', '1',*/ '2', '3', '4', '5', '6', '7', '8', '9' };
// Create a byte array to hold the random bytes.
byte[] randomNumber = new byte[length];
// Create a new instance of the RNGCryptoServiceProvider.
RNGCryptoServiceProvider Gen = new RNGCryptoServiceProvider();
// Fill the array with a random value.
Gen.GetBytes(randomNumber);
string result = "";
foreach (byte b in randomNumber)
{
// Convert the byte to an integer value to make the modulus operation easier.
int rand = Convert.ToInt32(b);
// Return the random number mod'ed.
// This yeilds a possible value for each character in the allowable range.
int value = rand % allowedCharacters.Length;
char thisChar = allowedCharacters[value];
result += thisChar;
}
return result;
}
精彩评论