How to check duplicate words in a string in C?
I am solving a problem in C where i have to find duplicate words in astring like
char a[]="This is it This";
In above string "This" appears开发者_运维技巧 two times so I would like to count it as one.
Can anybody suggest how to achieve this?
Here is a program that does what you're asking. It is hard coded for 4 words of a max 99 characters. That can be changed easily; I just fit it around your input. I also used strcmp
and strcpy
. Both of these functions can be implemented on your own (call them mystrcpy and mystrcmp and embed them). I'm not rewriting the string functions for you. I did show how to avoid strtok based on the other answer. I looked them up and they are not complex, but they did not add anything to the program and I didn't want to reinvent the wheel. Last of all, I just used a simple linear search in the notInArray
function. For a large data set this is not efficient (you would probably use some type of tree or hash).
This was compiled under gcc version 4.3.4
#include <stdio.h>
#include <string.h>
int notInArray(char arr[][100], char *word, int size);
int main() {
char a[] = "This is a This";
char *ptr;
char strarr[4][100];
char word[100];
int pos = 0;
int count = 0;
int i;
memset(&strarr,0,sizeof(strarr));
printf("%s\n\n",a);
ptr = a;
while (*ptr) {
sscanf(ptr, "%s ", word);
if (notInArray(strarr,word,4)) {
strcpy(strarr[pos++],word);
printf("%s\n", word);
}
while (!isspace(*ptr++) && *ptr) {}
}
for (i=0; i<4; i++) {
if (*strarr[i]) {
printf("strarr[%d]=%s\n",i, strarr[i]);
count++;
}
}
printf("\nUnique wordcount = %d\n", count);
return(0);
}
int notInArray(char arr[][100], char *word, int size) {
int i;
for (i=0; i<size; i++) {
if (*arr[i] && !strcmp(arr[i],word)) {
return(0);
}
}
return(1);
}
The output looks like:
~>a
This is a This
This
is
a
strarr[0]=This
strarr[1]=is
strarr[2]=a
Unique wordcount = 3
Enjoy.
I'd probably read words one at a time (e.g., using sscanf [Edit: just saw your comment -- it's still fairly easy without string functions -- just scan through to find space/non-space characters to find the words -- annoying but not major) and put them into an array (or, if you have a lot more than you've shown above, a binary search tree).
If you want a count of the number of times each word occurs, you can have an int (or whatever) in each node. If you just want to know the unique word in the input, you don't need a count, just a collection of words.
精彩评论