开发者

Compare Multiple Substrings

I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3. So input of

abcdef dfeabc

will return

1 abc

I am not sure how to go about solving the problem. I can compare the two strings, and see if they are completely equal. From there, i can compare length-1 string size, i.e. if dfeabc contains abcde. However, how can i get the program to do all possible str开发者_StackOverflow中文版ings, down to a minimal size of 3 characters? One issue is for the above example of length-1, I'd also have to do the string bcdef and compare that.


The naive way would be to get every single substring of string A and see if it's in string B.

Here's the naive way of doing that:

for ( int i = 0; i < a.length; i++ ) {
   for ( int j = i+1; j <= a.length; j++ ) {
       if (b.contains(a.substring(i, j))) {
           //if longer than last found match, record this match
       }
   }
}

The slightly more optimal way would be to look at longer substrings first, so that the first substring that matches is necessarily the longest.

for ( int length = a.length; length > 0; length-- ) {
     for ( int i = 0; i + length < a.length; i++ ) {
         if ( b.contains(a.substring(i, i+length)) ) {
             //this is the biggest match
         }
     }
}


If you wanted to solve this in a simple way, without using any Java API for searching, you could think of it like this: for every pair of possible starting indices i in the first string and j in the second string, step forward in both while the corresponding characters in the first and the second string are equal, and return a result if you did at least 3 steps.


You need to use the Longest Common Substring algorithm, a dynamic programming problem. Check Wikipedia's entry for an explanation of the algorithm and here for a sample implementation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜