Creating a sortable string from multiple strings in .Net
I need a function that converts an array of strings into a string that is sortable in the same order as if you would sort the inputs (sort first input argument, if equal sort second etc...)
In native code, separating the strings with \0 would do, but somehow
("a" + char.MinVa开发者_运维技巧lue + "2").CompareTo("a1") equals to 1!
What is going on and is it possible to create such a function?
public static string StringsToKey(params string[] values)
EDIT: This is the test I want to succeed:
Assert.IsTrue(MiscUtils.StringsToKey("a", "2").CompareTo(MiscUtils.StringsToKey("a1")) < 0);
I would like to avoid using CompareOrdinal because I'm not always in control of how the key would be sorted. Also, ordinal might yield incorrect sorting order on international sets...
public static string StringsToKey(params string[] values)
{
return string.Join(char.MinValue.ToString(), values);
}
usage should work:
var s1 = StringsToKey("abc", "def", "1234");
var s2 = StringsToKey("ab", "cde", "f1234");
var comparisonResult = string.Compare(s1, s2, StringComparison.Ordinal);
EDIT:
var s1 = StringsToKey("a", "2");
var s2 = StringsToKey("a1");
var r = string.Compare(s1, s2, StringComparison.Ordinal);
returns a negative value(-49
).
Edit II:
as Joe mentioned in comments, this won't work if the string contains '\0'
(or '\t'
if tab used as delimiter). So, there is no such a delimiter which works in all cases. So, I rewrote the function, but now it has 2 parameters but still uses CompareTo
method for comparison:
public static int CompareStringSequences(
IEnumerable<string> first,
IEnumerable<string> second)
{
var x = Enumerable.Zip(first, second, (s1, s2) => s1.CompareTo(s2))
.FirstOrDefault(i => i != 0);
return x == 0 ?
x1.Length - x2.Length :
x;
}
Unlike null-terminated C-style strings, a .NET string may contain null characters, so in the general case, there is no separator that will guarantee to sort lower than any valid character in a string.
Of course if you're sure your strings don't contain nulls and ordinal sorting is OK, some of the solutions already posted will work.
But I'd say don't even attempt it. Writing a comparer for your string arrays will be more efficient (avoiding creating the temporary strings) and better expresses your intent. There are lots of ways this could be done, including the following (compares IList<string> rather than string[] for more flexibility, and supports culture-sensitive comparisons):
public class StringListComparer : IComparer<IList<string>>
{
private StringComparer comparer;
public StringListComparer()
: this(StringComparer.Ordinal)
{
}
public StringListComparer(StringComparer comparer)
{
if (comparer == null) throw new ArgumentNullException("comparer");
this.comparer = comparer;
}
public int Compare(IList<string> x, IList<string> y)
{
int result;
for (int i = 0; i < Math.Min(x.Count, y.Count); i++)
{
result = comparer.Compare(x[i], y[i]);
if (result != 0) return result;
}
return x.Count.CompareTo(y.Count);
}
}
The above code is untested and possibly buggy, but I'm sure you get the idea.
the String.CompareTo() method returns an int comparing the two values to the length of the shortest string. Given that '\0' is "less than" the character '1' it follows it in normal string order. You can use the String.Split() method to split on any character. including '\0'. You can then sort the strings using the Array.Sort() method.
Basically, if using "ordinal sorting", the operation you want is to simply concatenate all the strings. To illustrate, let's take two arrays of strings:
a={"My", "dog", "is", "black"}
b={"My", "cat", "sleeps", "often"}
Comparing each string in these arrays until you find one that differs:
a={"My", "dog", "is", "black"}
b={"My", "cat", "sleeps", "often"}
^
first difference: b is "less than" a
... is exactly equal to comparing the concatenated strings:
a="Mydogisblack"
b="Mycatsleepsoften"
^
First difference; again, b < a
Your method's code would simply be:
public static string StringsToKey(params string[] values)
{
var builder = new StringBuilder();
foreach(var s in values)
builder.Append(s + "\0"); //the \0 is ASCII 0, used to delimit words
return builder;
}
This code seems to work for situations when input does not contain tabs:
public static string StringsToKey(params string[] values)
{
return values.Aggregate("", (current, value) => current + value + '\t');
}
精彩评论