开发者

Removing repeated characters in string without using recursion

You are given a string. Develop a function to remove duplicate characters from that string. String could be of any length. Your algorithm must be in space. If you wish you can use constant siz开发者_如何学编程e extra space which is not dependent any how on string size. Your algorithm must be of complexity of O(n).

My idea was to define an integer array of size of 26 where 0th index would correspond to the letter a and the 25th index for the letter z and initialize all the elements to 0. Thus we will travel the entire string once and and would increment the value at the desired index as and when we encounter a letter.

and then we will travel the string once again and if the value at the desired index is 1 we print out the letter otherwise we do not.

In this way the time complexity is O(n) and the space used is constant irrespective of the length of the string!!

if anyone can come up with ideas of better efficiency,it will be very helpful!!


Your solution definitely fits the criteria of O(n) time. Instead of an array, which would be very, very large if the allowed alphabet is large (Unicode has over a million characters), you could use a plain hash. Here is your algorithm in (unoptimized!) Ruby:

def undup(s) 
  seen = Hash.new(0)
  s.each_char {|c| seen[c] += 1}
  result = ""
  s.each_char {|c| result << c if seen[c] == 1}
  result
end

puts(undup "")
puts(undup "abc")
puts(undup "Olé")
puts(undup "asdasjhdfasjhdfasbfdasdfaghsfdahgsdfahgsdfhgt")

It makes two passes through the string, and since hash lookup is less than linear, you're good.

You can say the Hashtable (like your array) uses constant space, albeit large, because it is bounded above by the size of the alphabet. Even if the size of the alphabet is larger than that of the string, it still counts as constant space.

There are many variations to this problem, many of which are fun. To do it truly in place, you can sort first; this gives O(n log n). There are variations on merge sort where you ignore dups during the merge. In fact, this "no external hashtable" restriction appears in Algorithm: efficient way to remove duplicate integers from an array (also tagged interview question).

Another common interview question starts with a simple string, then they say, okay now a million character string, okay now a string with 100 billion characters, and so on. Things get very interesting when you start considering Big Data.

Anyway, your idea is pretty good. It can generally be tweaked as follows: Use a set, not a dictionary. Go trough the string. For each character, if it is not in the set, add it. If it is, delete it. Sets take up less space, don't need counters, and can be implemented as bitsets if the alphabet is small, and this algorithm does not need two passes.

Python implementation: http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/


You can also use a bitset instead of the additional array to keep track of found chars. Depending on which characters (a-z or more) are allowed you size the bitset accordingly. This requires less space than an integer array.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜