开发者

Pairwise alignment of long string sequences

I want to find the globally optimal (or close to optimal) pairwise alignment between two long (tens of thousands) sequences of strings, but the algorithm is expected to operate on any object sequences. I also want to use my own distance function implementation to compute the similarity of two objects. For shorter sequences, I could use the dynamic time warping (DTW) algorithm but the DTW algorithm needs to compute and store a n*m distance matrix (n,m are lengths of the sequences) which is not feasible for longer sequences. Can y开发者_如何学JAVAou recommend such algorithm? A working implementation would be a plus.

The following example clarifies what the algorithm needs to do:

Input:
Sequence A: i saw the top of the mountain
Sequence B: then i see top of the mountains

Result:    
Sequence A:      i saw the top of the mountain
Sequence B: then i see     top of the mountains


I don't know, if I understood your requirements right, but it sounds to me like you are trying to solve a Stable Marriage Problem. The original solution of Gale and Shapley in the link is in O(n*m) time and O(n+m) space, if my memory serves me right. The implementation was pretty straightforward. There are also some later solutions with different variants of the problem.

You also could try to solve this problem by using Maximum Bipartite Graph Matching, e.g. here, for getting a different optimality criterion. This also can be done in O(n*m).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜