开发者

Suffix Tree (Binary String): Find longest substring

Looking for an efficient algorithm to find the longest substring in a string that has its complementary string as well (bitwise).

That what I mean by saying complementary string bitwise:

10001开发者_运维百科1
011100


Here's a simple O(n) algorithm that relies on suffix tree construction.

Load the original string s and the complement string s' into the same suffix tree (O(n) time). Then post-process this tree by setting for each node x two flags f(x) and f'(x) that are true exactly when f(x) (resp. f'(x)) contains a suffix of s (resp. s'). Now simply traverse the tree looking for the deepest node that has both flags set and you have found the longest string in s whose complement also occurs in s. The post-processing also costs only O(n) time so the total running time is O(n).


The following algorithm is worst case O(n * n) and average case just a bit superlinear. It hits that worst case when there are a lot of long common substrings.

Consider the set of substrings that can be formed by starting at an arbitrary point in your string. Build a trie of those substrings that ends in a pointer to a location in the original string once there is only one possible match. You have to work on this trie for each of the n substrings, but you only have to follow it out through the longest common substring that string has with any other.

Once you've built this data structure, do a recursive walk through the trie looking looking to pair a substring with its complement. The structure of the trie should make this pairing very efficient since you only have to pair up opposing substrings, and not substrings with all of the other places in the string it might be.

If certain characters are common while their complements are uncommon, you might be able to improve performance by lazily building the trie.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜