开发者

Creating tree data structure

i have some data:

A
AXNHJNEHWXNOECMEJK
DNFJNXYEEQWhsdbchjsxs
XMJQWsdsEOJdfsKMDJE

....

Each row is array and each letter is object. I have comparer function which could say that letter A is equavalent of letter a(actually it is not letter. It's russian words and comparer function use morphology to let me know that word are equal for example матрешка==матрешки==матрешкины and arrays are russian sen开发者_开发问答tences. For example: "Мама мыла раму"). I want to create tree data structure which looks like:

1) A
2.1) BA
2.2) DHBAFH
3.1) BEDMEWA
etc...

Otherwise child nodes must contain letters from parent nodes. If you know how to work google adwords i think you can understand me. My question is how to do that FAST. I need to create tree with thousands arrays. Compare function works very slow(it use big dictionary) that's why speed is real problem.

Some simple data(sorry for russian):

here is set of sentences

сайты        
сайты недорого
сайты дешево
сайты дешево и быстро
красивый сайт по доступным ценам 
хочу купить хороший стул 
стул по доступным ценам

we must create following tree data structure

1) сайты
1->2.1) сайты недорого
1->2.2) сайты дешево
1->2.3) красивый сайт по доступным ценам 
1->2.2->3) сайты дешево и быстро

other parent nodes:

1) хочу купить хороший стул 
1) стул по доступным ценам

Child nodes must contain more words then parent.


Well,

Seems that this link could be helpful for your problem

Fast String Searching With Suffix Trees: http://marknelson.us/1996/08/01/suffix-trees/

and

Suffix tree

http://en.wikipedia.org/wiki/Suffix_tree


Start with sentences that have one word. They all are going to be parent nodes, so this is simple.

Then continue with two-word sentences. You have to match each of them with every one-word parent node, which is going to be quite slow, because of your slow comparison function. You can do two optimizations, though: first check whether the words are exactly the same. You can do this yourself and it's going to be fast. Another one is to remember the results of the comparison function for every pair of compared words. You're going to waste some memory, but you're going to gain some speed.

When a node matches, add the sentence to it. When the sentence doesn't match any node, make it a parent node.

For sentences with gradually increasing lengths, you do the same, except you have to try matching children of a node that matched, to find the correct place to add the sentence.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜