What is a simple C library for a set of integer sets?
I've got to modify a C program and I need to include a set of unsigned integer sets. That is, I have millions of sets of integers (each of these integer sets contains between 3 and 100 integers), and I need to store these in some structure, lets call it the directory, that can in logarithmic time tell me whether a given integer set already exist开发者_StackOverflow中文版s in the directory. The only operations that need to be defined on the directory is lookup and insert.
This would be easy in languages with built-in support for useful data structures, but I'm a foreigner to C and looking around on Google did (surprisingly) not answer my question satisfactorily. This project looks about right:
http://uthash.sourceforge.net/
but I would need to come up with my own hash key generator.
This is a standard, simple problem, so I hope there is a standard and simple solution.
It depends on what you are going to do with the data. But maybe tsearch does already what you want. You could also build a sorted array for each set and look up the values with bsearch, although the performance may suffer during the insertion.
EDIT: If you are looking for an (external) library, you'll find a comparision of some C and C++ hash table implementation here. The author of the article has written a generic header implementation called khash. So you're compiled binary don't have any additional dependencies.
EDIT: sorry, I started answering as it's C++ and not C. Yes then you should find your hash function and code it by yourself.. since you already know the average dimension of a set it's not so difficult, just choose a good hash function! But you'll need to codify a whole set in a single number if you want to check if a directory is already there.
You can try by iteratively hashing the single numbers of the set:
int hashcode = initvalue
for (int i = 0; i < 0; ++i)
hashcode = calc_code(hashcode, number_set[i], i);
in a way that the hashfunction depends on its previous value, the current number and the current index.
What about STL sets?
#include <set>
int nums[6] = {1,6,34,2,67,41};
set<int> numbers;
for( int i = 0; i < 6; ++i ) numbers.insert(nums[i]);
for( set<int>::const_iterator iter = numbers.begin(); iter != numbers.end(); ++iter )
cout << *iter << ' ';
Using this data structure you can easily store all your sets, but you need also a way to check if a set is already included in the directory. It's not clear: do you want to know if a set that have all the SAME elements exists already in the directory?
You can do it manually by checking all the elements but since you have millions of them you should find a way to hash the elements of the set in an unique number and use a map of sets..
If I understand you correctly, you want to represent a set of sets of integer which I don't think is particularly trivial.
The first point is to represent a set of integers. The simplest way would be use a variable size array like this:
typedef struct {
int size;
int elems[1];
} intset;
than you can create a new set (with a fixed number of elements) with
intset *newset(int size)
{
intset *set;
set = malloc(sizeof(intset) + sizeof(int)*(size-1));
if (set) set->size = size;
return set;
}
and store the elements with set->elems[0]=i1; ...
.
Another option would be to use bit arrays but the implementation would depend of the nature of the integers to store (e.g. are they within a fixed range? Do they usually appear in groups in a set?).
Once you have your set of integers you will need a compare function (to determine whether two sets have the same elements). If you opted for an array to represent a set and you keep that array sorted, is quite simple to check if two sets are identical; if it's a bitmap, it will depend on how you implemented it.
Now, for the set of sets you can choose a (sorted) vector, that you might need to resize from time to time while inserting elements, or an hash table. In the latter case you'll need to write an hash function for your sets of integer (possibly using existing functions!).
As I said, it seems not trivial to me, I'm not surprised google didn't help.
It's not extremely complicated, though, you just have to take some decisions before proceeding.
Implement a simple hash table yourself. It will make you a better programmer, when you know how to implement one on your own.
http://en.wikipedia.org/wiki/Hash_table
精彩评论