Short "progress code" algorithm
Does anyone have ideas to securely encode a small amount of data into approximately 8 characters? I have a project (Flash and Flex (AS3) not that it really matters) that must be deployed on CD/DVD-ROM (no local storage nor network access) but with a requirement to allow people to come back where they left off. Therefore there will need to be a code that is human-friendly to print (on dead-tree or PDF) or write down and to enter back into an input field, as well as being secure to stop cheating between different students.
There are 14 assessment items that can be completed in any order. At the moment I CRC32 their "login name" and take 16 bits from that result, and combine them with the 14 assessment bits into 30 bits that is then encoded using a form of Base32, giving 6 characters, then adding a check character, then using a character replacement for encryption. Upon return to the program and entering the 7 character progress code, after decryption and extraction, if there is a match in the name hash parts, it is assumed the completion bits are correct.
One problem identified was that only two characters change when completing each item. Is there some algorithm that would completely change the 开发者_StackOverflowentire string if any one bit changes? Or is there a completely better way? I'd also like to store more data (dates, more bits for the hash to reduce collisions, location within the course, etc) but don't want to go above 8 or 10 characters.
All cryptographic-strength hash functions (e.g. MD5) have the property that changing a single input bit will change every output bit with nearly 50% probability. I would suggest a simple 2-stage procedure when generating the progress code:
- Represent the completed items as a 14-digit binary string (e.g. items 1, 2 and 4 completed would be represented by the string
11010000000000
). Take the first 4 bytes (8 hex digits) of the MD5 hash of this string. - Take the first 4 bytes of the MD5 hash of the user's login name.
XOR
these together, and report the result as an 8-hex-digit string.
When a user later returns to their work:
- Take the first 4 bytes of the MD5 hash of the user's login name.
- Prompt for the progress code.
XOR
these together to produce the first 4 bytes of the MD5 hash of the 14-digit binary string.- Look this up in a precomputed table containing all 2^14 = 16384 possible valid results to determine exactly which combination of items has been completed. (The number of possibilities is small enough that a linear scan would be feasible, though a binary search will be even faster.)
There's a small chance that the first 4 bytes of the MD5 hashes of all those 14-digit binary strings will not be distinct. According to the approximation here, but with n=2^14 and using 2^32 instead of 365, this has around a 3% chance of occurring. If if does, my suggestion is to append a few fixed characters to each and try again, until distinctness is achieved. This should not take many attempts given the low collision probability. Note that using just 7 hex digits raises the collision probability to around 40%, so don't do that! There's no need to use a separate check-digit scheme as only 2^14 in 2^32, or 1 in 262144 randomly guessed progress codes will actually be valid.
If you're concerned about savvy users being able to fake their own progress codes by undoing the XOR
of someone else's progress code and re-XOR
ing with the MD5 hash of their own username, you can always "salt" the usernames first by appending a fixed string (known only to you) to each username.
Like most encryption, it just needs to be stronger than the value of the data it's protecting.
By the sounds of it, you aren't up against governments or professional cryptographers...
If I can have a couple more characters, you could use a real encryption algorithm with a relatively small block size (CAST-128 with a 64 bit block size comes to mind). Base64 encoding will increase that to a 12 character code.
ie. Code = Base64(CAST128(username + completion bits)). To avoid cases of long usernames, you could hash it first:
Code = Base64(CAST128(Hash(Username) + completion bits))
Should be secure enough for an assessment program. (One glaring problem is that you need to embed the key in the software... but again, if you aren't up against government agents...)
精彩评论