Enumerate all n-gram subwords in a phase
Any exis开发者_JAVA技巧ting function that can handle this problem? Input: A B C output: {A},{B}, {C}, {A B}, {B C}, {A B C}
note that {A C} or {C A} are not valid output.
In pseudo code:
for (i=0 .. n-1) {
for (j=i .. n-1) {
ngrams.add(phase[i:j])
}
}
phase[i:j]
is a slice starting at i
and ending at j
and n
is the length (in this case 3)
A B C
0 1 2
0:0 A
0:1 AB
0:2 ABC
1:1 B
1:2 BC
2:2 C
I figured it out: O(n^3) algorithm
public static void GenerateAllGrams(string query) {
string[] q = query.Split(' ');
int maxgram = q.Length;
for (int gram = 1; gram <= maxgram; gram++) {
for (int i = 0; i < q.Length - gram + 1; i++) {
string current = "";
for (int j = i; j < i + gram; j++) {
current += q[j] + " ";
}
Console.WriteLine(current.Trim());
}
}
}
In scheme:
(define (prefix x list)
(if (null? list)
nil
(cons (cons x (car list))
(prefix x (cdr list)))))
(define (subwords phrase)
(if (null? phrase)
nil
(cons (list (car phrase))
(cons (prefix (car phrase) (subwords (cdr phrase)))
(subwords (cdr phrase))))))
精彩评论