Get the list of index in subsequence matching
i have 2 sequences, for instance s=aaba and ss=aa, and i want all the way ss is in s. In this example: [0,1], [0,3] and [1,3] My code is below. It works fine, except for very long s with multiple ss. In that case i've got
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
(I already use java with -Xmx at the maximum I can…)
public static ArrayList<ArrayList<Integer>> getListIndex(String[] s, String[] ss, int is, int iss) {
ArrayList<ArrayList<Integer>> listOfListIndex = new ArrayList<ArrayList<Integer>>();
ArrayList<ArrayList<Integer>> listRec = new ArrayList<ArrayList<Integer>>();
开发者_如何学JAVA ArrayList<Integer> listI = new ArrayList<Integer>();
if (iss<0||is<iss){
return listOfListIndex;
}
if (ss[iss].compareTo(s[is])==0){
//ss[iss] matches, search ss[0..iss-1] in s[0..is-1]
listRec = getListIndex(s,ss,is-1,iss-1);
//empty lists (iss=0 for instance)
if(listRec.size()==0){
listI = new ArrayList<Integer>();
listI.add(is);
listOfListIndex.add(listI);
}
else{
//adding to what we have already found
for (int i=0; i<listRec.size();i++){
listI = listRec.get(i);
listI.add(is);
listOfListIndex.add(listI);
}
}
}
//In all cases
//searching ss[0..iss] in s[0..is-1]
listRec = getListIndex(s,ss,is-1,iss);
for (int i=0; i<listRec.size();i++){
listI = listRec.get(i);
listOfListIndex.add(listI);
}
return listOfListIndex;
}
Is there anyway to do this more efficiently ?
I doubt the recursion is the problem (think of what the maximum recursion depth is). The algorithm can be efficiently implemented by collecting the indecies of each character of s
in ss
in TreeSet
s and then simply taking the .tailSet
when needing to "advance" in the string.
import java.util.*;
public class Test {
public static Set<List<Integer>> solutions(List<TreeSet<Integer>> is, int n) {
TreeSet<Integer> ts = is.get(0);
Set<List<Integer>> sol = new HashSet<List<Integer>>();
for (int i : ts.tailSet(n+1)) {
if (is.size() == 1) {
List<Integer> l = new ArrayList<Integer>();
l.add(i);
sol.add(l);
} else
for (List<Integer> tail : solutions(is.subList(1, is.size()), i)) {
List<Integer> l = new ArrayList<Integer>();
l.add(i);
l.addAll(tail);
sol.add(l);
}
}
return sol;
}
public static void main(String[] args) {
String ss = "aaba";
String s = "aa";
List<TreeSet<Integer>> is = new ArrayList<TreeSet<Integer>>();
// Compute all indecies of each character.
for (int i = 0; i < s.length(); i++) {
TreeSet<Integer> indecies = new TreeSet<Integer>();
char c = s.charAt(i);
for (int j = 0; j < ss.length(); j++) {
if (ss.charAt(j) == c)
indecies.add(j);
}
is.add(indecies);
}
System.out.println(solutions(is, -1));
}
}
Output:
[[0, 1], [1, 3], [0, 3]]
ArrayList<Integer>
is quite memory-inefficient due to the overhead of the wrapper class. Using TIntArrayList
from GNU Trove will probably cut down your memory usage by a factor of 3 (or even more if you're running on a 64bit JVM).
Well, the basic problem is that your algorithm is recursive. Java doesn't do tail-call optimization, so every recursive call just adds to the stack until you overflow.
What you want to do is re-structure your algorithm to be iterable so you aren't adding to the stack. Think about putting a loop (with a termination test) as the outer-most element of your method instead.
Another way to look at this problem is to break it into two steps:
- Capture the positions of all of the given character ('a' in your example) into a single set.
All you want here is the complete set of combinations between them. Remember that the equation for number of combinations of r things chosen from n different things is:
C(n,r) = n!/[r!(n-r)!]
精彩评论