Convert leet-speak to plaintext
I'm not that hip on the L33t language beyond what I've read on Wikipedia.
I do need to add a dictionary check to our password-strength-validation tool, and since leet-speak only adds trivial overhead to the password cracking process, I'd like to de-leet-ify the input before checking it against the dictionary.
To clarify the reason开发者_开发知识库ing behind this: When required to add symbols to their passwords many users will simply do some very predictable leet substitution on a common word to meet the number and symbol inclusion requirement. Because it is so predictable, this adds very little actual complexity to the password over just using the original dictionary word. \Edit
Not knowing all the rules, especially the multi-character substitutions like "//" for "W", and being certain this is a problem that has been addressed many times including certainly by open source projects.
I'm looking for code samples, but haven't found any so-far. If it is C# code that would be a bonus!, but code in any common language will help.
Also, it would be nice to have an extensible approach, as I understand this dialect evolves quickly. It would be nice to be able to add-in some rules in a year as those evolve.
And no, this is not the basis for my entire password strength check. This is only the part I am asking for help on in this post. So we are not distracted by other elements of password and security concerns, let me describe the password concerns that don't have to do with leet-speak:
We measure the bits of entropy in the password per NIST special publication 800-63, and require a policy-configurable equivalent measure (56 bits for example) for the password to be valid. This still leaves room for dictionary words that have been simply leet-ed and from an entropy perspective aren't a whole lot better plain dictionary words.
I would simply like to tell users that "P@s5w0rd" is too close a dictionary word, and they could probably find a stronger password.
I know there is a lot more to security considerations like the balance between passwords that humans can remember, and passwords that are secure. This isn't that question.
All I'm asking about is converting l33t to plaintext which should be nearly as fun and interesting of a topic as code golf. Has anyone seen any code samples?
I must say I think this is a bad idea... If you want them to be strong, come up with better requirements.. must be at least 8 characters, contain upper AND lowercase letters, contain at least one number, and at least one special character. Implement a maximum authorization failure counter before disabling an account. Once that's done, what are you worried about?
Also offering some code:
String password = @"\/\/4573Fu|_";
Dictionary<string, string> leetRules = new Dictionary<string, string>();
leetRules.Add("4", "A");
leetRules.Add(@"/\", "A");
leetRules.Add("@", "A");
leetRules.Add("^", "A");
leetRules.Add("13", "B");
leetRules.Add("/3", "B");
leetRules.Add("|3", "B");
leetRules.Add("8", "B");
leetRules.Add("><", "X");
leetRules.Add("<", "C");
leetRules.Add("(", "C");
leetRules.Add("|)", "D");
leetRules.Add("|>", "D");
leetRules.Add("3", "E");
leetRules.Add("6", "G");
leetRules.Add("/-/", "H");
leetRules.Add("[-]", "H");
leetRules.Add("]-[", "H");
leetRules.Add("!", "I");
leetRules.Add("|_", "L");
leetRules.Add("_/", "J");
leetRules.Add("_|", "J");
leetRules.Add("1", "L");
leetRules.Add("0", "O");
leetRules.Add("5", "S");
leetRules.Add("7", "T");
leetRules.Add(@"\/\/", "W");
leetRules.Add(@"\/", "V");
leetRules.Add("2", "Z");
foreach (KeyValuePair<string,string> x in leetRules)
{
password = password.Replace(x.Key, x.Value);
}
MessageBox.Show(password.ToUpper());
Based on samy's answer above, here is an even further enhanced version. It allows for multiple output rules per input char, and in particular has rules set up for all non-alphanumeric chars to be dropped out of the string. The result is that you can send in the classic XKCD comic password of Tr0ub4dor&3 and get out Troubador.
I am using this for much the same purpose as the OP, to confirm that the password supplied to my system that contains highly secured data, is not based on a dictionary word.
I'm taking the output of the decode function, and running it through a dictionary.
public class LeetSpeakDecoder
{
private Dictionary<string, IEnumerable<string>> Cache { get; set; }
private Dictionary<string, List<string>> Rules = new Dictionary<string, List<string>>();
public void AddRule(string key, string value)
{
List<string> keyRules = null;
if (Rules.ContainsKey(key))
{
keyRules = Rules[key];
}
else
{
keyRules = new List<string>();
Rules[key] = keyRules;
}
keyRules.Add(value);
}
public LeetSpeakDecoder()
{
Cache = new Dictionary<string, IEnumerable<string>>();
AddRule("4", "A");
AddRule("4", "a");
AddRule(@"/\", "A");
AddRule("@", "A");
AddRule("^", "A");
AddRule("13", "B");
AddRule("/3", "B");
AddRule("|3", "B");
AddRule("8", "B");
AddRule("><", "X");
AddRule("<", "C");
AddRule("(", "C");
AddRule("|)", "D");
AddRule("|>", "D");
AddRule("3", "E");
AddRule("6", "G");
AddRule("/-/", "H");
AddRule("[-]", "H");
AddRule("]-[", "H");
AddRule("!", "I");
AddRule("|_", "L");
AddRule("_/", "J");
AddRule("_|", "J");
AddRule("1", "L");
AddRule("0", "O");
AddRule("0", "o");
AddRule("5", "S");
AddRule("7", "T");
AddRule(@"\/\/", "W");
AddRule(@"\/", "V");
AddRule("2", "Z");
const string nonAlpha = @"0123456789!@#$%^&*()-_=+[]{}\|;:'<,>./?""";
foreach (var currentChar in nonAlpha)
{
AddRule(currentChar.ToString(), "");
}
}
public IEnumerable<string> Decode(string leet)
{
var list = new List<string>();
if (Cache.ContainsKey(leet))
{
return Cache[leet];
}
DecodeOneCharacter(leet, list);
DecodeMoreThanOneCharacter(leet, list);
DecodeWholeWord(leet, list);
list = list.Distinct().ToList();
Cache.Add(leet, list);
return list;
}
private void DecodeOneCharacter(string leet, List<string> list)
{
if (leet.Length == 1)
{
list.Add(leet);
}
}
private void DecodeMoreThanOneCharacter(string leet, List<string> list)
{
if (leet.Length > 1)
{ // we split the word in two parts and check how many variations each part will decode to
for (var splitPoint = 1; splitPoint < leet.Length; splitPoint++)
{
foreach (var leftPartDecoded in Decode(leet.Substring(0, splitPoint)))
{
foreach (var rightPartDecoded in Decode(leet.Substring(splitPoint)))
{
list.Add(leftPartDecoded + rightPartDecoded);
}
}
}
}
}
private void DecodeWholeWord(string leet, List<string> list)
{
if (Rules.ContainsKey(leet))
{
foreach (var ruleValue in Rules[leet])
{
list.Add(ruleValue);
}
}
}
}
Here is my output
Tr0ub4dor&3
Tr0ub4dor&E
Tr0ub4dor&
Tr0ub4dor3
Tr0ub4dorE
Tr0ub4dor
Tr0ubAdor&3
Tr0ubAdor&E
Tr0ubAdor&
Tr0ubAdor3
Tr0ubAdorE
Tr0ubAdor
Tr0ubador&3
Tr0ubador&E
Tr0ubador&
Tr0ubador3
Tr0ubadorE
Tr0ubador
Tr0ubdor&3
Tr0ubdor&E
Tr0ubdor&
Tr0ubdor3
Tr0ubdorE
Tr0ubdor
TrOub4dor&3
TrOub4dor&E
TrOub4dor&
TrOub4dor3
TrOub4dorE
TrOub4dor
TrOubAdor&3
TrOubAdor&E
TrOubAdor&
TrOubAdor3
TrOubAdorE
TrOubAdor
TrOubador&3
TrOubador&E
TrOubador&
TrOubador3
TrOubadorE
TrOubador
TrOubdor&3
TrOubdor&E
TrOubdor&
TrOubdor3
TrOubdorE
TrOubdor
Troub4dor&3
Troub4dor&E
Troub4dor&
Troub4dor3
Troub4dorE
Troub4dor
TroubAdor&3
TroubAdor&E
TroubAdor&
TroubAdor3
TroubAdorE
TroubAdor
Troubador&3
Troubador&E
Troubador&
Troubador3
TroubadorE
Troubador
Troubdor&3
Troubdor&E
Troubdor&
Troubdor3
TroubdorE
Troubdor
Trub4dor&3
Trub4dor&E
Trub4dor&
Trub4dor3
Trub4dorE
Trub4dor
TrubAdor&3
TrubAdor&E
TrubAdor&
TrubAdor3
TrubAdorE
TrubAdor
Trubador&3
Trubador&E
Trubador&
Trubador3
TrubadorE
Trubador
Trubdor&3
Trubdor&E
Trubdor&
Trubdor3
TrubdorE
Trubdor
I'm finding the question interesting so here is an additional answer as an intellectual exercice; since leet speak cannot be mapped to a unique word, you have to examine the possible decoded values that one leet speak chain can give. Here some sample code:
public class LeetSpeakDecoder
{
private Dictionary<string, IEnumerable<string>> Cache { get; set; }
private Dictionary<string, string> Rules { get; set; }
public LeetSpeakDecoder()
{
Cache = new Dictionary<string, IEnumerable<string>>();
Rules = new Dictionary<string,string>();
Rules.Add("4", "A");
// add rules here...
}
public IEnumerable<string> Decode(string leet)
{
var list = new List<string>();
if (Cache.ContainsKey(leet))
{
return Cache[leet];
}
DecodeOneCharacter(leet, list);
DecodeMoreThanOneCharacter(leet, list);
DecodeWholeWord(leet, list);
list = list.Distinct().ToList();
Cache.Add(leet, list);
return list;
}
private void DecodeOneCharacter(string leet, List<string> list)
{
if (leet.Length == 1)
{
list.Add(leet);
}
}
private void DecodeMoreThanOneCharacter(string leet, List<string> list)
{
if (leet.Length > 1)
{ // we split the word in two parts and check how many variations each part will decode to
for (var splitPoint = 1; splitPoint < leet.Length; splitPoint++)
{
foreach (var leftPartDecoded in Decode(leet.Substring(0, splitPoint)))
{
foreach (var rightPartDecoded in Decode(leet.Substring(splitPoint)))
{
list.Add(leftPartDecoded + rightPartDecoded);
}
}
}
}
}
private void DecodeWholeWord(string leet, List<string> list)
{
if (Rules.ContainsKey(leet))
{
list.Add(Rules[leet]);
}
}
}
The code considers that
- one character can be kept as is (
DecodeOneCharacter
) - a word must be decoded by the combination of the decoded values for all the possible splits of the word (
DecodeMoreThanOneCharacter
) - a word must be decoded directly against the rules (
DecodeWholeWord
)
The caching is very useful since the code is quite inefficient at tracking what permutations are useless: splitting "wasteful" into "w" and "asteful" or into "wa" and "steful" will lead to some repetitions of the decoding on the right, then eventually on the left. I don't have much data on it but it was regularly hit for more than 90% of the decodes; not too shabby for one small addition.
Since it returns a combination of all the possible leet decoded strings you may want to have a look at it for your solution: for example Fosco's code will decode "137M3P455" to "BTMEPASS" but you may want to know that it also translates to "LETMEPASS" -- which should make you cringe a bit more.
Why don't you just implement a function to just create "pronounceable" passwords and require users to use them? It seems like much less work and better security.
精彩评论