开发者

How to not include line breaks when comparing two strings

i am comparing updates to two strings. i did a:

 string1 != string2

and they turn out different. I put them in the "Add Watch" and i see the only difference is one has line breaks开发者_Go百科 and the other doesnt'.:

 string1 = "This is a test. \nThis is a test";
 string2 = "This is a test. This is a test";

i basically want to do a compare but dont include line breaks. So if line break is the only difference then consider them equal.


A quick and dirty way, when performance isn't much of an issue:

string1.Replace("\n", "") != string2.Replace("\n", "")


I'd suggest regex to reduce every space, tab, \r, \n to a single space :

Regex.Replace(string1, @"\s+", " ") != Regex.Replace(string2, @"\s+", " ")


Assuming:

  1. The sort of direct char-value-for-char-value comparison of != and == is what is wanted here, except for the matter of newlines.
  2. The strings are, or may, be large enough or compared often enough to make just replacing "\n" with an empty string too inefficient.

Then:

public bool LinelessEquals(string x, string y)
{
    //deal with quickly handlable cases quickly.
    if(ReferenceEquals(x, y))//same instance
        return true;         // - generally happens often in real code,
                             //and is a fast check, so always worth doing first.
    //We already know they aren't both null as
    //ReferenceEquals(null, null) returns true.
    if(x == null || y == null)
        return false;
    IEnumerator<char> eX = x.Where(c => c != '\n').GetEnumerator();
    IEnumerator<char> eY = y.Where(c => c != '\n').GetEnumerator();
    while(eX.MoveNext())
    {
        if(!eY.MoveNext()) //y is shorter
            return false;
        if(ex.Current != ey.Current)
            return false;
    }
    return !ey.MoveNext(); //check if y was longer.
}

This is defined as equality rather than inequality, so you could easily adapt it to be an implementation of IEqualityComparer<string>.Equals. Your question for a linebreak-less string1 != string2 becomes: !LinelessEquals(string1, string2)


Here's an equality comparer for strings that ignores certain characters, such as \r and \n.

This implementation doesn't allocate any heap memory during execution, helping its performance. It also avoids virtual calls through IEnumerable and IEnumerator.

public sealed class SelectiveStringComparer : IEqualityComparer<string>
{
    private readonly string _ignoreChars;

    public SelectiveStringComparer(string ignoreChars = "\r\n")
    {
        _ignoreChars = ignoreChars;
    }

    public bool Equals(string x, string y)
    {
        if (ReferenceEquals(x, y))
            return true;
        if (x == null || y == null)
            return false;
        var ix = 0;
        var iy = 0;
        while (true)
        {
            while (ix < x.Length && _ignoreChars.IndexOf(x[ix]) != -1)
                ix++;
            while (iy < y.Length && _ignoreChars.IndexOf(y[iy]) != -1)
                iy++;
            if (ix >= x.Length)
                return iy >= y.Length;
            if (iy >= y.Length)
                return false;
            if (x[ix] != y[iy])
                return false;
            ix++;
            iy++;
        }
    }

    public int GetHashCode(string obj)
    {
        throw new NotSupportedException();
    }
}


A cleaner approach would be to use:

string1.Replace(Environment.NewLine, String.Empty) != string2.Replace(Environment.NewLine, String.Empty);


This is a generalized and tested version of Jon Hannas answer.

/// <summary>
/// Compares two character enumerables one character at a time, ignoring those specified.
/// </summary>
/// <param name="x"></param>
/// <param name="y"></param>
/// <param name="ignoreThese"> If not specified, the default is to ignore linefeed and newline: {'\r', '\n'} </param>
/// <returns></returns>
public static bool EqualsIgnoreSome(this IEnumerable<char> x, IEnumerable<char> y, params char[] ignoreThese)
{
    // First deal with quickly handlable cases quickly:
    // Same instance - generally happens often in real code, and is a fast check, so always worth doing first.
    if (ReferenceEquals(x, y))
        return true;         //
    // We already know they aren't both null as ReferenceEquals(null, null) returns true.
    if (x == null || y == null)
        return false;
    // Default ignore is newlines:
    if (ignoreThese == null || ignoreThese.Length == 0)
        ignoreThese = new char[] { '\r', '\n' };
    // Filters by specifying enumerator.
    IEnumerator<char> eX = x.Where(c => !ignoreThese.Contains(c)).GetEnumerator();
    IEnumerator<char> eY = y.Where(c => !ignoreThese.Contains(c)).GetEnumerator();
    // Compares.
    while (eX.MoveNext())
    {
        if (!eY.MoveNext()) //y is shorter
            return false;
        if (eX.Current != eY.Current)
            return false;
    }
    return !eY.MoveNext(); //check if y was longer.
}


string1.replace('\n','') != string2.replace('\n','')


Cant you just strip out the line breaks before comparing the strings?

E.g. (pseudocode)...

string1.replace('\n','') != string2.replace('\n','')


Here's a version in VB.net based on Drew Noakes answer

Dim g_sIgnore As String = vbSpace & vbNewLine & vbTab 'String.Format("\n\r\t ")

Public Function StringCompareIgnoringWhitespace(s1 As String, s2 As String) As Boolean
    Dim i1 As Integer = 0
    Dim i2 As Integer = 0
    Dim s1l As Integer = s1.Length
    Dim s2l As Integer = s2.Length

    Do
        While i1 < s1l AndAlso g_sIgnore.IndexOf(s1(i1)) <> -1
            i1 += 1
        End While
        While i2 < s2l AndAlso g_sIgnore.IndexOf(s2(i2)) <> -1
            i2 += 1
        End While
        If i1 = s1l And i2 = s2l Then
            Return True
        Else
            If i1 < s1l AndAlso i2 < s2l AndAlso s1(i1) = s2(i2) Then
                i1 += 1
                i2 += 1
            Else
                Return False
            End If
        End If
    Loop
    Return False
End Function

I also tested it with

Try
    Debug.Assert(Not StringCompareIgnoringWhitespace("a", "z"))
    Debug.Assert(Not StringCompareIgnoringWhitespace("aa", "zz"))
    Debug.Assert(StringCompareIgnoringWhitespace("", ""))
    Debug.Assert(StringCompareIgnoringWhitespace(" ", ""))
    Debug.Assert(StringCompareIgnoringWhitespace("", " "))
    Debug.Assert(StringCompareIgnoringWhitespace(" a", "a "))
    Debug.Assert(StringCompareIgnoringWhitespace(" aa", "aa "))
    Debug.Assert(StringCompareIgnoringWhitespace(" aa ", " aa "))
    Debug.Assert(StringCompareIgnoringWhitespace(" aa a", " aa a"))
    Debug.Assert(Not StringCompareIgnoringWhitespace("a", ""))
    Debug.Assert(Not StringCompareIgnoringWhitespace("", "a"))
    Debug.Assert(Not StringCompareIgnoringWhitespace("ccc", ""))
    Debug.Assert(Not StringCompareIgnoringWhitespace("", "ccc"))
Catch ex As Exception
    Console.WriteLine(ex.ToString)
End Try


I've run into this problem a number of times when I'm writing unit tests that need to compare multiple line expected strings with the actual output strings.

For example, if I'm writing a method that outputs a multi-line string I care about what each line looks like, but I don't care about the particular newline character used on a Windows or Mac machine.

In my case I just want to assert that each line is equal in my unit tests and bail out if one of them isn't.

public static void AssertAreLinesEqual(string expected, string actual)
{
    using (var expectedReader = new StringReader(expected))
    using (var actualReader = new StringReader(actual))
    {
        while (true)
        {
            var expectedLine = expectedReader.ReadLine();
            var actualLine = actualReader.ReadLine();

            Assert.AreEqual(expectedLine, actualLine);

            if(expectedLine == null || actualLine == null)
                break;
        }
    }
}

Of course, you could also make the method a little more generic and write to return a bool instead.

public static bool AreLinesEqual(string expected, string actual)
{
    using (var expectedReader = new StringReader(expected))
    using (var actualReader = new StringReader(actual))
    {
        while (true)
        {
            var expectedLine = expectedReader.ReadLine();
            var actualLine = actualReader.ReadLine();

            if (expectedLine != actualLine)
                return false;

            if(expectedLine == null || actualLine == null)
                break;
        }
    }

    return true;
}

What surprises me most is that there isn't a method like this included in any unit testing framework I've used.


I've had this issue with line endings in an unit test.

//compare files ignoring line ends
    org.junit.Assert.assertEquals(
            read.readPayload("myFile.xml")
                    .replace("\n", "")
                    .replace("\r", ""),
            values.getFile()
                    .replace("\n", "")
                    .replace("\r", ""));

I usually do not like to make this kind of comparison (comparing the whole file), as a better approach would be validating the fields. But it answers this question here, as it removes line endings for most of the systems (the replace calls is the trick).

PS: read.readPayload reads a text file from the resources folder and puts it into a String, and values is a structure that contains a String with the raw content of a file (as String) in its attributes.

PS2: No performance was considered, since it was just an ugly fix for unit test

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜