开发者

Token Suffix Tree Tutorial

Can someone please point to tutorials on - "Token Su开发者_JAVA技巧ffix Trees".


From googling that same phrase and scanning the first couple of results, my guess is that they are talking about a suffix tree in which the "letters" (or "characters", or "elements") are not individual ASCII or UNICODE characters as we are accustomed to, but rather the lexical tokens from some computer language.

So e.g. for C you would have a "letter" called int, and another letter called (, and so on. I'm not sure exactly how tokens that are subsequences of other tokens (e.g. + is a subsequence of ++) would be handled, but my guess would be that they are handled in the same way the lexer deals with them, which is (for C at least) by always greedily building the longest token (so e.g. the 5 input characters +++++ will be lexed as ++, ++, +).


Not sure if it is what you are looking for, but your question reminds me of what I know as 'suffix trees on words', e.g. http://www.larsson.dogma.net/words-alg.pdf

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜