开发者

Give two strings s1 and s2 , give an algorithm to find all characters from S1 which are also in S2

Exampl开发者_运维问答e :

S1 : abcde S2: cdef

Answer : cde


It's reasonable to assume the set of characters is small and finite compared to the potential string length. So, handling 8-bit characters for example (in roughly-C++ pseudo-code):

bool in_a[256] = { false };
bool in_b[256] = { false };
for (int i = 0; i < a.size(); ++i)
    in_a[a[i]] = true;
for (int i = 0; i < b.size(); ++i)
    in_b[b[i]] = true;
// and in_a and in_b
for (int i = 0; i < b.size(); ++i)
    if (in_a[i] && in_b[i])
    {
        std::cout << i;
        if (isprint(i)) std::cout << '\'' << (char)i << '\'';
        std::cout << ' ';
    }
std::cout << '\n';

Note that using a hash table rather than an array is a huge waste of time (unless handling say a 32-bit character representation).

An implementation of the simplification suggested in Yordan's comment follows. This avoids the final loop from 0..255 and needs only one tracking array. The order of results is unsorted.

#include <vector>
#include <iostream>

int main()
{
    std::string a = "abdcdefg";
    std::string b = "byfdz";
    bool track[256] = { false };
    for (int i = 0; i < a.size(); ++i)
        track[a[i]] = true;
    for (int i = 0; i < b.size(); ++i)
        if (track[b[i]])
        {
            track[b[i]] = false; // don't match again
            std::cout << (int)b[i];
            if (isprint(b[i])) std::cout << " \'" << (char)(b[i]) << "\'";
            std::cout << ' ';
        }
    std::cout << '\n';
}


Use some sort of set data structure: Fill the set with each character of S1. Then check for each character in S2 if it is in the set.

Depending on the implementation, both insertion and lookup on the set do only cost O(1).


Sort both of the strings into the same order, and then linearly scan through both sequences at the same time comparing each item, if it isn't the same, increment the sequence with the lower value character. It should be O(N log N) at a guess.


Off the top of my head I'd split both strings into a character array and sort them. I'd then go through one array and for each character check if its in the other. Having both arrays sorted means that you only need to loop through each array once when getting your result. I'm not sure if the overhead of sorting makes it any more efficient though.

Things that I have considered might be relevant are:

1) duplicate characters in a string - dismissed because off the top of my head I couldn't think of a nice efficient way of filtering out duplicates.

2) Different cases might want to be considered equal (eg A and a are considered the same) - not asked for but a toupper or tolower will solve that.


In ruby, algorithm O(len(s1) + len(s2)) using a hash table

require 'jcode'
def toto(s1, s2)
    contains={}
    s2.each_char { |x| contains[x]=1 }
    res=""
    s1.each_char do |x|
        if contains[x]==1
            res<<=x
        end
    end
    return res
end

puts toto("abcde","cdef")


Another method suggests itself to me now... In pseudo code since its easier to explain:

for each (character in stringA)
{
    charHashTable[character] = 1

}
for each (character in stringB)
{
    if (charHashTable[character]==1)
        if (!OutputArray.Contains(character))
            OutputArray.Add(character)
}

I'm not 100% sure of the performance characteristics of putting stuff into hash arrays but it occured to me that this might be a better way to efficiently find if a character is in one array than sorting it and doing it that way. If anybody can comment on the performance benefits of this method over the sorting both strings one I'd be very interested.


Use a hash map where the keys are the characters of the first string and the values are an int starting at zero. Then just look up each character from S2 and if the lookup succeeds then increment the value. Setting up the hash map is O(n) and each lookup is O(1). Worst case performance is about O(2n) which equivalent to O(n).

Just for reference, you could you the Boost unordered_map container in the real world.


If you know the number of unique characters that the strings are made up of (e.g.: If using only lower case characters then a-z means 26 unique symbols) then have an array of that size using the symbols as the index. Iterate over each string and for each of it's characters increment the corresponding array location (say you find b then a[1]++). In the end count the number of array positions which have a value of 2. Of course this solution has more space complexity.

Edit: Count the number of array positions that have 2 or more as the value. (Thanks to Chris for pointing out the flaw in the earlier answer)


var commonChars = S1.Intersect(S2);

EDIT: This is from System.Linq - and it is an algorithm - it's a single, one-line algorithm. :-)


Try doing this in Python:

char_in_s2=''

for char in s1:

      if char in s2:
         char_in_s2=char_in_s2+char

print(char_in_s2)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜