开发者

How to find largest substring among two Strings?

How to find l开发者_运维问答argest COMMON substring among two Strings?


Probably using a Suffix Tree. Create trees for both strings, then use those structures to look for paths that are common.


I think you can solve this using a pretty elegant dynamic programming algorithm that runs in O(mn) time and O(mn) space, where m and n are the number of characters in each string. The idea is based on the following recurrence. Let the two strings be A = a0 a1 a2 ... an-1 and B = b0 b1 b2 ... bm-1 and look at their first characters a0 and b0. Then there are three ways you can try to find the longest common subsequence:

  1. If the first characters are equal, one option would be to find the longest common subsequences of the rest of the two strings, then to prepend the first character to the match.
  2. Alternatively, you can decide not to match the first two characters. In that case, one option would be to see what the longest common subsequence you could make while ignoring the first character of the first string. 3 Finally, you could also ignore the first character of the second string.

This gives us a very nice recurrence:

LCS(A[0 .. n], B[0 .. m]) = longest of {
  A[0] + LCS(A[1 .. n], B[1 .. m])   (if A[0] == B[0]),
         LCS(A[1 .. n], B[0 .. m]),
         LCS(A[0 .. n], B[1 .. m])
    }

As our base cases, the longest common substring of any string and the empty string is the empty string:

LCS (A[n .. n], B[i, m]) = ""
LCS (A[i .. n], B[m, m]) = ""

This definition of the longest common substring allows you to compute the value of LCS(A[i .. n], B[j .. m]) given the three values LCS(A[i + 1 .. n], B[j + 1 .. m]), LCS(A[i .. n], B[j + 1 .. m]), and LCS(A[i + 1 .. n], B[j .. m]). Consequently, if you compute these values in the right order, you can just populate a table with the results in one pass and construct the result from there. Using some standard DP tricks, this can be made to run in O(mn).


Basically two ways: 1.dynamic programming, which cost O(mn) time and O(mn) space. "templatetypedef" has answered this. 2. suffix tree, by this you need to build it first, the building process with suffix link is O(m+n) (time and space), if without suffix link is O((m+n)^2) (time). Though building the suffix tree process is of same efficiency with dynamic programming in best case, however, after you have built it, you could get the longest common substring in O(k) time (k is the length of the LCS).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜