padding in generalized suffix tree and implementation resource
From the wikipedia page, it says using unique terminator strings $0
, $1
, …, $n-1
for a tree with n
strings, s1
, ..., sn
.
My question is: how to deal with situations in which there are li开发者_高级运维teral suffix of $i
for string i+1
? For example, my first string s1
is example$0
. What is the clever way of doing this?
Also, the implementation of suffix tree I found are mostly for a single string, not for the generalized version. Given a implementation for a single string, how can one easily extend it?
Thank you!
1st question: if you're using Unicode, you may use PUA codes (http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters) which are not assigned in your environment. Starting at U+E000 would do. If you're using 8-bit ascii, use a byte code which you know is not in your strings -- \003 (end of text) sounds appropriate -- instead of that '$'.
2nd question: just start over, only starting with the current tree instead of an empty one. The unique terminators guarantee that you'll never find yourself trying to split a leaf node.
精彩评论