How to find a common suffix in two strings?

2023-01-20 00:16 问答作者：

I'm trying to implement something that finds the common suffix between a number of strings, for the sake of illustrations, consider the following:

"The quick brown fox"
"The not so quick brown fox"
"The smelly brown fox"
"The vicious brown fox"

To a human, it's hopefully obvious that the common suffix here is " brown fox", and my naive implementation currently takes the first pair of strings, converts them both to char arrays, and then iterates over these until a character is found to be different, I then create a new string from this, and crop it to the length, reverse it back to the correct order, and return that. I then repeat using the result from the first string with the next string in the list.

Whilst this is loosely O(N), performance of this isn't as good as I'd like, and I wondered before I spend a long time buried in the profiler if I'd missed a quicker way to do this within the .NET framework?

EDIT: Taking out the double reverses (which then means we don't need to convert to char arrays) gives pretty good performance, for the record, my implementation looks a little like:

    private string GetCommonSuffix(string[] 开发者_运维知识库lines)
    {
        int lineCount = lines.GetLength(0);

        string currentSuffix = lines[0];
        int currentSuffixLength = currentSuffix.Length;
        for (int i = 1; i < lineCount; i++)
        {
            string thisLine = lines[i];
            if (!thisLine.EndsWith(currentSuffix))
            {
                int thisLineLength = thisLine.Length;
                int maxPossible = thisLineLength < currentSuffixLength ? thisLineLength : currentSuffixLength;

                if (maxPossible == 0)
                {
                    return string.Empty;
                }

                for (int j = 1; j < maxPossible; j++)
                {
                    if( currentSuffix[ currentSuffixLength - j ] != thisLine[ thisLineLength - j ] )
                    {
                        currentSuffix = currentSuffix.Substring(currentSuffixLength - j + 1, j - 1);
                        currentSuffixLength = j - 1;
                        break;
                    }
                }
            }
        }

        return currentSuffix;
    }

Well, to start with you don't need to convert the strings into char arrays. You can use indexers into the strings to fetch individual characters.

It's probably worth thinking of it as a number rather than a string... each pairwise comparison will give you a maximal value, and the final number (the size of the suffix) is the minimum of those maxima.

So two approaches suggest themselves:

Start with 0 (always valid) and work your way up: check whether 1 is valid (i.e. all strings end with the same character) then move to 2 (by checking the penultimate character) etc
Start with infinity, then do pairwise comparisons to reduce the maximum length. Of course you don't need to do all pairwise comparisons - just a comparison of each string with the first should be fine.

Personally I'd probably go with the first approach though - it won't have as good cache coherency, but I think it'll be better in some situations (e.g. many strings, all but one of which have a long common suffix.

(Of course, once you've got the length, getting the actual substring is very simple.)

Your approach seems OK. You could iterate over all strings, not just two at a time which would save some reverses (and a lot of time if, say, the last string had no common prefix but all the others did - you'd do a lot of work for nothing with the pair wise approach)

There's also no need to reverse the current candidate common suffix until you've completed all the comparisons.

However, you could avoid the reverses by keeping an array of indexes to where you are for each string, initialising each to the length of the string (minus 1) and work backwards from the end, iterating over all the strings.

this might be a good candidate for Memoized recursive functions given that you may be wanting to hold onto previously calculated values.

basic example: http://weblogs.asp.net/podwysocki/archive/2008/08/01/recursing-into-recursion-memoization.aspx

or: http://explodingcoder.com/blog/content/painless-caching-memoization-net

might fit, might be useless :)

继续阅读：.net string

How to find a common suffix in two strings?

更多精彩内容

精彩评论

最新问答

英雄联盟手游二周年什么时候开始?？

小米电视开机黑屏指示灯一闪一闪?？

dnf手游四维是哪四维?？

黑神话悟空ps5什么时候发售?？

结扎后输卵管复通多少钱？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？