开发者

String Sort based on some Format

I have a string which needs to be sorted based on the sort_fmt. Ex: If the string is 'abdcdfs' & the sort_fmt is 'dacg'. Upon sort, the output should be 'ddacfbs'. As you see, there might be characters in input string that are not present in the order string and vice-versa. The characters of input string that are not present in the order string should come at the end of the output string in any order.

Here's w开发者_C百科hat I have written. It works, It's O(n*m) algo. I was wondering is there are better & shorter ways to do this? Maybe use itertools?

def sort_str(s, sort_fmt):
    sorted_str = ''
    str_hash   = dict()

    # O(n)
    for ch in s:
        if ch in str_hash:
            str_hash[ch] += 1
        else:
            str_hash[ch] = 1

    # O(m) + O(1) where m<=n
    for ch in sort_fmt:
        if ch in str_hash:
            cnt = str_hash[ch]
            sorted_str += cnt * ch

    # O(n)
    for ch in s:
        if ch not in sort_fmt:
            sorted_str += ch
    return sorted_str


if __name__ == '__main__':
    print sort_str('abdcdfs', 'dacg')


You are trying to implement counting sort which is indeed O(n) under certain conditions. However your implementation has two bugs near the end which mean that the actual time complexity of your implementation is O(n2 + n*m):

for ch in s:
    if ch not in sort_fmt:  # <--- "in" requires a linear search. O(n*m)
        sorted_str += ch    # <--- Ouch! Concatenation! O(n^2)
  • You are constructing the result in an inefficient way because you are using concatenation in a loop.
  • Using in on a string is linear in the length of the string, and you are doing this in a loop.

Try this instead. It requires Python 2.7 or newer because of the use of collections.Counter, but the Counter can easily be replaced with a defaultdict for older versions of Python):

from collections import Counter

def sort_str(s, sort_fmt):
    counter = Counter(s)
    d = set(sort_fmt)
    result = ''.join(c * counter[c] for c in sort_fmt)
    result += ''.join(c for c in s if c not in d)
    return result

if __name__ == '__main__':
    print sort_str('abdcdfs', 'dacg')

Here's a more concise way to get the result you want if you drop the requirement that it should be O(n):

>>> d = dict((v,k) for (k,v) in enumerate('dacg'))
>>> sorted('abdcdfs', key = lambda c:d.get(c, len(d)))
['d', 'd', 'a', 'c', 'b', 'f', 's']


I am not sure about the complexity of sorted. This works

def sort_str(s, frmt):
    l = len(frmt)
    return sorted(s, key = lambda x: frmt.index(x) if x in frmt else l)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜