开发者

Find 'new' items from two containers

I have two containers (the actual container is flexible, unsorted vs sorted doesn't matter to me, so whatever works best for answering my question is what I'll use) which contain some data. I want to compare these two containers, and either remove all 'duplicates' from the second, or create a new container with only the 'new' values.

By duplicate/newI mean the following: Container 1 contains: [1, 2, 4, 8, 16] Container 2 contains: [1, 2, 4, 16, 32]

After running the algorithm, the new container (or modified container 2) should contain: Container 3 contains开发者_StackOverflow中文版: [32]

Notice that I do NOT want '8' to be in the new container (or modified container) as I only want to find the 'new' values.

I could easily implement a naive and slow program to do this myself, however I'm looking for the most elegant and efficient way to achieve this (Boost is fine if the STL does not provide all the necessary tools/algos without rolling your own, otherwise rolling your own is fine too).

So... What would be the 'best' (read: most elegant and efficient) way to do this?

Thanks in advance.

P.S. If it is at all relevant, I'm using this to write a 'diffing' tool for exported functions from a DLL. I have a number of very large DLLs and I want to find the 'new' exports in the latest builds of those DLLs.


Looks like STL set_difference might be right for you. Example from here:

// set_difference example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

int main () {
  int first[] = {5,10,15,20,25};
  int second[] = {50,40,30,20,10};
  vector<int> v(10);                           // 0  0  0  0  0  0  0  0  0  0
  vector<int>::iterator it;

  sort (first,first+5);     //  5 10 15 20 25
  sort (second,second+5);   // 10 20 30 40 50

  it=set_difference (first, first+5, second, second+5, v.begin());
                                               // 5 15 25  0  0  0  0  0  0  0

  cout << "difference has " << int(it - v.begin()) << " elements.\n";

  return 0;
}


The simplest method would likely be to sort and then iterate. As the two containers are both sorted, you can simply directly compare each index (or de-referenced iterator) for equality, and insert into new (or remove from existing) only if not equal. This is O(n logn) and depends on operator< and operator==.


hash_table in STL can solve this problem.

  1. Insert all the elements in container one into the hash_table.
  2. For every element in container two, check whether it's in the hash_table or not; if not, push it into container three.

The overall time complexity is O(n). The sort and compare method have time complexity O(nlogn).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜