What steps should I take to identify this algorithm
I have a application that stores passwords in a ms sql database, I wish to replicate the algorithm used to generate those passwords.
I have to admit I am abit lost at exactly what I should try next.
Str in => Output
0 = 0x81
00 = 0x81 0x95
000 = 0x81 0x95 0x83
001 = 0x81 0x95 0x82
002 = 0x81 0x95 0x81
100 = 0x80 0x95 0x83
900 = 0x88 0x95 0x83
ddddddddd = 0x55 0x41 0x57 0x5E 0x4E 0x48 0x4F 0x57 0x40
dddddddddd = 0x55 0x41 0x57 0x5E 0x4E 0x48 0x4F 0x57 0x40 0x42
Characters found in output => from 0x21 to 0x9A
I tried xor however didn't get any thing that yielded results. I am thinking each character must depend somehow on the previous, but I cannot find anything linking them together.
Is there a process or order of things I should check while trying to figure this out? Could anyone please give me a hint or push in the right direction to solving it. Any help would be very much appreciated.
EDIT Testing your thoughts SeanA seems to prove accurate...
dddddddddd = 0x55 0x41 0x57 0x5E 0x4E 0x48 0x4F 0x57 0x40 0x42
ddddd0dddd = 0x55 0x41 0x57 0x5E 0x4E 0x9C 0x4F 0x57 0x40 0x42
dd0ddddddd = 0x55 0x41 0x83 0x5E 0x4E 0x48 0x4F 0x57 0x40 0x42
1234567890123456... 0x80 0x97 0x80 0x8E 0x9F 0x9A 0x9C 0x8B 0x9D 0x96 0x9A 0x98....
It certinly does appear that each char has a value depending on its place in the string. There is not a limit on the size of the password. When more time prevales I shall significantly increase the sample size, is there any particular values I should test that increase the chances of finding a relationship?
EDIT 2
I have completed a table (xls) as suggested, you can download from http://www.filedropper.com/result_1 This show a chart of the difference between input and output as a decimal. There is a strong pattern between the input displayed as hex and difference values. Each char place has very consistant patterns which follow down the chart (every character in that position.) Also the possi开发者_高级运维ble range for each position is consistant throughout each group of 16 values which align with the change of the most significant digit on the input value when displayed as hex.
I have however from the table found that I can encode passwords with the same result as the algorithm, so our theory seems to hold.
I would very much like to continue to investigate and find the algorithm itself, and really do appreciate any pointers helping me find the right direction.
is there any particular values I should test that increase the chances of finding a relationship?
Cracking an unknown encryption algorithm is a matter of developing theories based on available evidence, and seeing if they work. In this case, we have a theory that this is a simple substitution cifer, with a different substitution table for each character position.
Attempt to prove this as follows:
Run through all possible characters in the 1st position and "read off" the corresponding encyphered character to give the substitution table for the 1st position.
Repeat for the 2nd, 3rd, etc
Once all substitution tables have been captured, try a number of random passwords to confirm that the algorithm and tables have been guessed correctly.
If the final step succeeds, you've cracked the encryption algorithm for practical purposes, and the decryption algorithm is trivial to derive; e.g. by inverting the substitution tables.
I must say: if this works out, this is a terribly lame encryption algorithm. Even if we haven't got it exactly right, there is far too much predictability.
It looks like the same character in the same position always produces the same output. Perhaps the algorithm is nothing more than a table lookup? Is there a maximum size password?
You could write code to print out all character outputs for position 1 to n, and then you have the algorithm solved and can use the lookup you create to further encode passwords.
Doing this with 100% accuracy is theoretically impossible.
It's like asking, "Which mathematical curve fits all these points?", when, in fact, there are an infinite number of curves that can fit any arbitrary finite number of points.
If you can place restrictions on the algorithm, then you might be able to get a definite answer. For example, in a signals and systems class, you would learn that linear time-invariant systems (LTI systems) are completely characterized by their "impulse responses" -- in other words, you can completely determine an LTI system by testing how it responds to a special input called an impulse (an instantaneous "bang"). However, without such a restriction, there's an infinite number of possibilities.
Are you not able to look at the code or documentation for this application? If it's using any sort of strong encryption you're going to have hard time reverse engineering the key. It looks like it's using some sort of cipher feedback mechanism, since previous characters affect the encryption. On the other hand it doesn't appear to be using something like an initialization vector, since the same starting characters generate the same output. See perhaps block cipher modes of operation for some clues there.
Do perhaps you know what key is already used? If you do you could try guessing at the algorithm and its parameters. Especially if it's some common or well known piece of software, you can probably search around the internet to find it. Or if it's your own software, then it's a matter of digging around the code to see how it's done. It could also just be some simple algorithm that's not strong encryption and perhaps even security by obscurity. In that case you or someone else might just able to reverse engineer it with some experiments.
精彩评论