开发者

find the longest sequence S that is a subsequence of A,B,C string

Give a polynomial time algorithm that takes three strings, A, B and C, as input, and returns the longest s开发者_开发问答equence S that is a subsequence of A, B, and C.


Let dp[i, j, k] = longest common subsequence of prefixes A[1..i], B[1..j], C[1..k]

We have:

dp[i, j, k] = dp[i - 1, j - 1, k - 1] + 1 if A[i] = B[j] = C[k]
              max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise

Similar to the 2d case, except you have 3 dimensions. Complexity is O(len A * len B * len C).


Here's a solution in Python for an arbitrary number of sequences. You could use it to test your solution for 2D, 3D cases. It closely follows Wikipedia's algorithm:

#!/usr/bin/env python
import functools
from itertools import starmap

@memoize
def lcs(*seqs):
    """Find longest common subsequence of `seqs` sequences.
    
    Complexity: O(len(seqs)*min(seqs, key=len)*reduce(mul,map(len,seqs)))    
    """
    if not all(seqs):  return () # at least one sequence is empty
    heads, tails = zip(*[(seq[0], seq[1:]) for seq in seqs])
    if all(heads[0] == h for h in heads): # all seqs start with the same element
        return (heads[0],) + lcs(*tails)
    return max(starmap(lcs, (seqs[:i]+(tails[i],)+seqs[i+1:]
                             for i in xrange(len(seqs)))), key=len)
def memoize(func):
    cache = {}
    @functools.wraps(func)
    def wrapper(*args):
        try: return cache[args]
        except KeyError:
            r = cache[args] = func(*args)
            return r
    return wrapper

Note: without memoization it is an exponential algorithm (wolfram alpha):

$ RSolve[{a[n] == K a[n-1] + K, a[0] = K}, a[n], n]
a(n) = (K^(n + 1) - 1) K/(K - 1)

where K == len(seqs) and n == max(map(len, seqs))

Examples

>>> lcs("agcat", "gac")
('g', 'a')
>>> lcs("banana", "atana")
('a', 'a', 'n', 'a')
>>> lcs("abc", "acb")
('a', 'c')
>>> lcs("XMJYAUZ", "MZJAWXU")
('M', 'J', 'A', 'U')
>>> lcs("XMJYAUZ")
('X', 'M', 'J', 'Y', 'A', 'U', 'Z')
>>> lcs("XMJYAUZ", "MZJAWXU", "AMBCJDEFAGHI")
('M', 'J', 'A')
>>> lcs("XMJYAUZ", "MZJAWXU", "AMBCJDEFAGUHI", "ZYXJAQRU")
('J', 'A', 'U')
>>> lcs() #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:
>>> lcs(*"abecd acbed".split())
('a', 'b', 'e', 'd')
>>> lcs("acd", lcs("abecd", "acbed"))
('a', 'd')
>>> lcs(*"abecd acbed acd".split())
('a', 'c', 'd')


All you have to do is Google "longest subsequence".

This is the top link: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

If you have any particular problem understanding it then please ask here, preferably with a more specific question.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜