Token Suffix Tree Tutorial
Can someone please point to tutorials on - "Token Su开发者_JAVA技巧ffix Trees".
From googling that same phrase and scanning the first couple of results, my guess is that they are talking about a suffix tree in which the "letters" (or "characters", or "elements") are not individual ASCII or UNICODE characters as we are accustomed to, but rather the lexical tokens from some computer language.
So e.g. for C you would have a "letter" called int
, and another letter called (
, and so on. I'm not sure exactly how tokens that are subsequences of other tokens (e.g. +
is a subsequence of ++
) would be handled, but my guess would be that they are handled in the same way the lexer deals with them, which is (for C at least) by always greedily building the longest token (so e.g. the 5 input characters +++++
will be lexed as ++
, ++
, +
).
Not sure if it is what you are looking for, but your question reminds me of what I know as 'suffix trees on words', e.g. http://www.larsson.dogma.net/words-alg.pdf
精彩评论