Generating word list from word
looking for ideas to get started on what I would call a word obscurifier word list generator.
it takes a string, e.g. "hello" and basically looks to generate further possibilities of similar words out of it, i.e. returning something like:
- h3ll0
- he11o
- HEL10
- h3LLo
- ...
- ...
As you can see I need to be cap sensitive.
I am just looking at ideas/ways I could kick this off.
Maybe the first pass does the cap stuff:
- hello
- Hello
- hEllo
- Hello
- HEllo
- ..开发者_高级运维.
and then feed that list/array to the method to sub numbers/symbols
I am confident in and will most likely use C# (at least to start) this application.
If something has already been written which is available which does the kind of thing I am talking about then all the better, i'd love to hear about it.
Thanks for reading.
This is too long to be a comment, but it's not a real answer. Merely a suggestion. First, consider this link:
http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/
You could think of your problem as computing a cartesian product of a sequence of sequences. Just thinking about alphanumeric characters, they have from 1 to 3 states, such as a the original character in lower case (if applicable), in upper case (if applicable), and the numeric replacement (again, if applicable). Or if you're starting with a number, the number, and the upper and lower case letter replacement. Such as:
A -> a, A, 4
B -> b, B, 8
C -> c, C
D -> d, D
// etc.
1 -> 1, L, l
2 -> 2
3 -> 3, e, E
// etc.
Each of those is a sequence. So in your problem, you might turn the original input "hello" into a process where you grab the sequences that correspond to each character in the string, and then take those sequences and get their cartesian products. The methodology in the linked blog from Eric Lippert would be a great guide for continuing from here.
This sample puts Anthony Pegram's idea into code. I hardcoded your letter mappings and input, but you will be able to change this easily.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace SO5672236
{
static class Program
{
static void Main()
{
// Setup your letter mappings first
Dictionary<char,string[]> substitutions = new Dictionary<char, string[]>
{
{'h', new[] {"h", "H"}},
{'e', new[] {"e", "E", "3"}},
{'l', new[] {"l", "L", "1"}},
{'o', new[] {"o", "O"}}
};
// Take your input
const string input = "hello";
// Get mapping for each letter in your input
IEnumerable<string[]> letters = input.Select(c => substitutions[c]);
// Calculate cortesian product
var cartesianProduct = letters.CartesianProduct();
// Concatenate letters
var result = cartesianProduct.Select(x => x.Aggregate(new StringBuilder(), (a, s) => a.Append(s), b => b.ToString()));
// Print out results
result.Foreach(Console.WriteLine);
}
// This function is taken from
// http://blogs.msdn.com/b/ericlippert/archive/2010/06/28/computing-a-cartesian-product-with-linq.aspx
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
}
// This is a "standard" Foreach helper for enumerables
public static void Foreach<T>(this IEnumerable<T> enumerable, Action<T> action)
{
foreach (T value in enumerable)
{
action(value);
}
}
}
}
You should look into string permutation.
http://www-edlab.cs.umass.edu/cs123/Projects/Permutation/project6.htm
Start with a
Dictionary:
key: letter
value: List of alternate choices for that letter
create a new empty word
for each letter in the word,
randomly choose an alternate choice and add it to the new word.
精彩评论