开发者

minimal cyclic sub string in a bigger cyclic string

I am trying to f开发者_StackOverflowind an algorithm that culd return the length of the shortest cyclic sub string in a larger cyclic string.

A cyclic string would be defined as a concatenation of tow or more identicle strings, e.g. "abababab", or "aaaa"...

Now in a given for example a string T = "abbcabbcabbcabbc" there is a cycle of the pattern "abbc" but the shortest cyclic sub string would be "bb".


If you're just looking for a substring that appears more than once:

Build a Suffix tree from the string.

While creating the suffix tree, you can count re-occurrences of every substring and save it on the number of occurrences on the node.

Then just do a BFS search on the tree (which will give you a layered search, from shorter to longer strings) and find the first substring which is longer than 1 that occurred more than once.

Total complexity: O(n) where n is the length of the string

Edit:

The paths from the root to the leaves have a one-to-one relationship with the suffixes of S

You can implement the tree that each node contains one letter, that will give you better granularity and allow you to see all the substrings by length.

Here's a suffix tree of banana where every node contains one letter, you can see that you have all the substrings there.

minimal cyclic sub string in a bigger cyclic string

If you'll look at the applications section of the suffix tree, you'll see that it is used for exactly this kind of tasks - finding stuff about substrings.

Look at the image from the root, you can see ALL the substrings start from the root (BFS list):

b
a
n
ba
an
na
ban
ana
nan
bana
anan
nana
banan
anana
banana


Let me call "abbc" the generator in your example - i.e. the string that you repeat in order to get the bigger string.

The very first observation is that the smaller string should be made by repeating some substring twice.

It's clear that the smallest string should be smaller than the generator repeated twice (2*generator), because 2*generator is cyclic.

Now note that you only need to consider the string obtained by taking the generator 3 times, when searching for smaller cyclic string. Indeed, if the smallest is not there, but it is in the 4*generator, then it must span at least two generators, but then it wouldn't be the smallest.

So now lets assume the bigger string is 3*generator (or 2*generator). Also it's clear that if the generator has only different digits, then the answer is 2*generator. If not then you just need to find all pairs of identical characters in the bigger string say at position i and j and check whether the string starting a i, which is 2*(j-i) long is cyclic. If you try them in order of increasing j-i, then you can stop after the first success.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜