Obscuring database id's
I have a table with a primary key that is auto increment. I want to have an image associated with the primary key but I don't want the primary key to be revealed. Would naming the images something like:
$filename = md5($primarykey + $secret_string) . '.jpg';
be a good solution?
I am worried that there could be a collision and a file be over written.
The other option of course is to generate a random string, check it doesnt exist as a file and store it in the database... but id prefer not to store additional data if its开发者_如何学Go unnecessary.
The other option is a logical transformation youtube url style e.g 1=a 2=b but with a randomised order e.g 1=x 2=m... but then there is the chance of it being decoded... plus md5 would probably be lighter than any youtube url function.
I would guess I am dealing with over two million records so what is the likely hood of a collision? Which option would you pick or can you think of a better approach?
Use a linear congruential generator. If you choose the values properly, then you will have a pseudorandom sequence with a very large period. No collisions, but note that this is just an obfuscation method and won't provide any real security (but I assume that is not what you are looking for).
I would guess I am dealing with over two million records so what is the likely hood of a collision?
According to Wikipedia, you'll need more than 2*10^19 records to get a 50% probability to have at least one collision, so I'd say you don't have to worry.
Typically for obscuring database ids in URLs you would actually encrypt the id, since the obscuring needs to be reversible for the server to look up the database record from the obscured id. The reversibility may not be as important for your case of mapping database ids to file names, but it's a simple pattern that would also eliminate the possibility of collisions.
You'll want to convert the encrypted data to ASCII using base64 or hex encoding, e.g.:
base64_encode(encrypt(id, secret_key))
And decoding:
decrypt(base64_decode(id), secret_key)
(above is pseudo-code, you'll have to find the appropriate functions in PHP)
You don't need anything too fancy for the encryption. A simple block cipher like DES will suffice.
The Code you provided is well suited to do the job. You don't have to worry about collisions, no integer that your databases uses will result in the same md5 hash as any other.
If you want to be really sure: here is a litte test script ;)
<?php
for($i = 0; $i < 1000000; ++$i) {
$hash = md5($i);
if(isset($x[$hash])) { die("COLLISION!"); }
$x[$hash] = true;
} }
echo "All is well";
A more complex solution seems like a waste of your time but in case you want shorter Ids you could give a short random generated id to every picture (and check if it's alrealy been used on generation time and generate another one if thats the case)
There's really two options you have:
- Generate something & Verify no collisions
- Generate something & Hope for no collisions
You can generally use the following options: - A hash - A randomly generated string - A UUID
Hash If you're choosing a hash, choose something with a low incidence of collisions. Also, when doing a hash consider why you want to obscure DB ids. It won't take long for somebody to figure out your hashes if you're hashing plain numbers, you absolutely need to salt it. The advantages of a salted hash is quick generation and low chance of collisions (in small cases absolutely no need to verify for these, so faster inserts). The downside is that any proper implementation will be SHA256 or better, which means it's long. You can do some hex-conversions if you want to save DB/Index space, that may be more then you want.
Random String This you can generate to any length that suits you, of any character set or numbers a-Z0-9. This also means "more" data in a shorter string that's used in URIs, REQUEST data, etc. The downside is that you have to check if it's in the database.
A UUID Like a hash, fast to generate, fairly low chance of collisions and can be modified to be "less" ugly then pure outputs.
My Suggestion Don't do it. I've had to deal with this before on a very large implementation that grew from being a very small implementation. Eventually you start doing "smart" things like creating totally unique identifiers (e.g. content type + your identifier) and start seeing some value in it, but then you have to deal with scale. Scaling this is very difficult. DBs are optimized for ids as primary keys, there's a surprising large amount of thought you would need to put into this if you wanted it to scale vertically. If you must, only use it for external client interactions.
I agree about using Base64. You could also use a Guid. Or just store the file name in the table, which I've done.
In additional, take care to avoid orphaned files.
A hashing algorithm such as MD5 is not a good solution for this, as hashes by definition to not assure uniqueness (they create hashes that have a finite number of values).
What you want is an encryption. Take a look at javax.crypto.Cipher.
If you can add a column, add a GUID as a UNIQUE
column in the table and use the <GUID>.jpg
as the keyname. GUID algorithms should not produce duplicates in the forseeable future, but just-in-case the UNIQUE
constraint would catch this.
If you want to use the ID, but don't want to reveal the ID, and want a low risk of collisions; one option would be to use a hash of the ID. The hash would be consistent, irreversible, and (excepting huge datasets - 232ish depending on the hash used) unique for each photo. You seem to have the basic idea of it in your question:
$filename = md5($primarykey + $secret_string) . '.jpg';
You substitute MD5 for a hashing better algorithm of your choice. Preferably something with a larger output. Based on skimming a variety of articles in response to the comments, it would seem that SHA512 or something similar would be a better fit.
Just use a hash of the primary key. The chances of there being a collision is veeeery low.
精彩评论