padding in generalized suffix tree and implementation resource
From the wikipedia page, it says using unique terminator strings $0, $1, …, $n-1 for a tree with n strings, s1, ..., sn.
My question is: how to deal with situations in which there are li开发者_高级运维teral suffix of $i for string i+1? For example, my first string s1 is example$0. What is the clever way of doing this?
Also, the implementation of suffix tree I found are mostly for a single string, not for the generalized version. Given a implementation for a single string, how can one easily extend it?
Thank you!
1st question: if you're using Unicode, you may use PUA codes (http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters) which are not assigned in your environment. Starting at U+E000 would do. If you're using 8-bit ascii, use a byte code which you know is not in your strings -- \003 (end of text) sounds appropriate -- instead of that '$'.
2nd question: just start over, only starting with the current tree instead of an empty one. The unique terminators guarantee that you'll never find yourself trying to split a leaf node.
加载中,请稍侯......
精彩评论