开发者

I want to find the strings that repeated more than 1 times

I have an array of strings.

Some of The strings are similar (for example, person is similar to twolegperson, animal is similar to animalgold).

I want to find the strings that are repeated more than 1 times (here person,animal).

Thank y开发者_StackOverflow中文版ou very much faty


You need Generalized Suffix Tree. For implementations see this question.


Naive pseudocode alogrithm:

int minMatchLen = 3;   // The minimum length of string match required
string stringArray[] = {"person", "twolegperson", "animal", "animalgold"}
for (i = 0; i < stringArray.length, i++) {
    int strLen = stringArray[i].length;
    for (substrIndex = 0; substrIndex < strLen - minMatchLen; substrIndex++) {
        for (substrLen = minMatchLen; substrLen < strLen - substrIndex; substrLen++) {
            string subString = stringArray[i].substr(substrIndex, substrLen);
            bool matchFound = false;
            for (j = i + 1; j < stringArray.length; j++) {
                if stringArray[j].contains(subString) {
                    print("String '" + subString + "' found in '" + stringArray[j] + "'");
                    matchFound = true;
                }
            }
            if (matchFound) print(""String '" + subString + "' found in '" + stringArray[i] + "'");
        }
    }
}             

This basically goes through each string in the array, extracts all possible substrings over a specified minimum length, and then search the strings in the remainder of the array for those substrings. I'm sure there are more elegant and efficient solutions, but this will get the job done. It'll probably be slow for a large array, though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜