String Sort based on some Format
I have a string which needs to be sorted based on the sort_fmt
. Ex: If the string is 'abdcdfs' & the sort_fmt is 'dacg'. Upon sort, the output should be 'ddacfbs'. As you see, there might be characters in input string that are not present in the order string and vice-versa. The characters of input string that are not present in the order string should come at the end of the output string in any order.
Here's w开发者_C百科hat I have written. It works, It's O(n*m) algo. I was wondering is there are better & shorter ways to do this? Maybe use itertools
?
def sort_str(s, sort_fmt):
sorted_str = ''
str_hash = dict()
# O(n)
for ch in s:
if ch in str_hash:
str_hash[ch] += 1
else:
str_hash[ch] = 1
# O(m) + O(1) where m<=n
for ch in sort_fmt:
if ch in str_hash:
cnt = str_hash[ch]
sorted_str += cnt * ch
# O(n)
for ch in s:
if ch not in sort_fmt:
sorted_str += ch
return sorted_str
if __name__ == '__main__':
print sort_str('abdcdfs', 'dacg')
You are trying to implement counting sort which is indeed O(n) under certain conditions. However your implementation has two bugs near the end which mean that the actual time complexity of your implementation is O(n2 + n*m):
for ch in s:
if ch not in sort_fmt: # <--- "in" requires a linear search. O(n*m)
sorted_str += ch # <--- Ouch! Concatenation! O(n^2)
- You are constructing the result in an inefficient way because you are using concatenation in a loop.
- Using
in
on a string is linear in the length of the string, and you are doing this in a loop.
Try this instead. It requires Python 2.7 or newer because of the use of collections.Counter
, but the Counter
can easily be replaced with a defaultdict
for older versions of Python):
from collections import Counter
def sort_str(s, sort_fmt):
counter = Counter(s)
d = set(sort_fmt)
result = ''.join(c * counter[c] for c in sort_fmt)
result += ''.join(c for c in s if c not in d)
return result
if __name__ == '__main__':
print sort_str('abdcdfs', 'dacg')
Here's a more concise way to get the result you want if you drop the requirement that it should be O(n):
>>> d = dict((v,k) for (k,v) in enumerate('dacg'))
>>> sorted('abdcdfs', key = lambda c:d.get(c, len(d)))
['d', 'd', 'a', 'c', 'b', 'f', 's']
I am not sure about the complexity of sorted. This works
def sort_str(s, frmt):
l = len(frmt)
return sorted(s, key = lambda x: frmt.index(x) if x in frmt else l)
精彩评论