开发者

padding in generalized suffix tree and implementation resource

From the wikipedia page, it says using unique terminator strings $0, $1, …, $n-1 for a tree with n strings, s1, ..., sn.

My question is: how to deal with situations in which there are li开发者_高级运维teral suffix of $i for string i+1? For example, my first string s1 is example$0. What is the clever way of doing this?

Also, the implementation of suffix tree I found are mostly for a single string, not for the generalized version. Given a implementation for a single string, how can one easily extend it?

Thank you!


1st question: if you're using Unicode, you may use PUA codes (http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters) which are not assigned in your environment. Starting at U+E000 would do. If you're using 8-bit ascii, use a byte code which you know is not in your strings -- \003 (end of text) sounds appropriate -- instead of that '$'.

2nd question: just start over, only starting with the current tree instead of an empty one. The unique terminators guarantee that you'll never find yourself trying to split a leaf node.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜