开发者

Enumerate all n-gram subwords in a phase

Any exis开发者_JAVA技巧ting function that can handle this problem? Input: A B C output: {A},{B}, {C}, {A B}, {B C}, {A B C}

note that {A C} or {C A} are not valid output.


In pseudo code:

for (i=0 .. n-1) {
    for (j=i .. n-1) {
        ngrams.add(phase[i:j])
    }
}

phase[i:j] is a slice starting at i and ending at j and n is the length (in this case 3)

A B C 
0 1 2

0:0 A
0:1 AB
0:2 ABC
1:1 B
1:2 BC
2:2 C


I figured it out: O(n^3) algorithm

public static void GenerateAllGrams(string query) {
        string[] q = query.Split(' ');
        int maxgram = q.Length;
        for (int gram = 1; gram <= maxgram; gram++) {
            for (int i = 0; i < q.Length - gram + 1; i++) {
                string current = "";
                for (int j = i; j < i + gram; j++) {
                    current += q[j] + " ";
                }
                Console.WriteLine(current.Trim());
            }
        }
    }


In scheme:

(define (prefix x list)
    (if (null? list)
        nil
        (cons (cons x (car list))
              (prefix x (cdr list)))))

(define (subwords phrase)
    (if (null? phrase)
        nil
        (cons (list (car phrase))
              (cons (prefix (car phrase) (subwords (cdr phrase)))
                    (subwords (cdr phrase))))))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜