C code - Logic to remove duplicate words from a 2D char array
Hi have a C code in which I have a 2D char array as -
names[100][20] //Currently maximum 100 names, each of 19 characters supported
This array gets filled by some logic with names. I keep track of total number of names found actually(it could be less than 100 names) in a variable names_found.
Now I want to remove duplicate names which might be present. What I plan to do is something like.
for(i=0;i<names_found;i++)
{
for(j=i+1;j<names_found;j++)
{
//开发者_Go百科Then compare(strcmp) each string/name with every other.
//e.g. if there were 4 names the comparisons done would be
//{name[0],name[1]},{name[0],name[2]},{name[0],name[3]}
//{name[1],name[2]} , {name[1],name[3]}
//& {name[2],name[3]}
//And then some more logic to remove duplicate based on result of strcmp results. Don't know what this logic would look like to store the result in place, in same 2D character buffer?
}
}
Is this logic of duplicate word removal, what I am doing correct, functionally?How can I optimize it for speed.
Any better/faster solution.
This is a simple approach. It assumes that the order of the names isn't important:
for (i = 0; i < names_found; i ++)
{
j = i + 1;
while (j < names_found)
{
if (strcmp(names[i], names[j]) == 0)
{
memmove(names + j, names + (names_found - 1), sizeof(names[0]));
-- names_found;
}
else
++ j;
}
}
There are ways and ways to do this faster, but not necessarily for such a small set. Also, your logic for removing the names will probably take longer than you think because it will either cause gaps in the array you will have to work around, or you will need to memmove() your answers back down to fill the gap.
Off hand a Boyer-Moore type search might speed things up, but depending on how fast the strcmp function is, you might not get any benefit from this due to overhead in setting up the lookups and such. If you set things up right, you might be able to use strstr() instead for your search, which probably does use a more advanced search algo.
Basically, your set is so small that optimizations may be a bit premature here.
It is logically ok: for each array element, search if there are other equal in the following elements, and if it is so, remove them; but you need to change dynamically the array size; e.g. if you remove 3 duplicates of the first element, then the remaining number of elements is smaller than names_found
, so that you need to update it accordingly.
You can make it faster if you sort the array (with a fast sorting algorithm, but it may depend on the size of the data) and then duplicates are all "side by side". Using a destination array would be faster, since if you find N duplicates, you do not need to move all the other array element back by N positions (in the worst case you need an array of the same size of the source array).
Another approach is to use a hash container; in this case you need a library (e.g. glib has an hashtable "object"), and your code would look different (e.g. you can "skip" duplicates when you populate the hashtable).
精彩评论