Alphanumeric sorting using LINQ
I have a string[]
in which every elements ends with some numeric value.
string[] partNumbers = new string[]
{
"ABC10", "ABC1","ABC2", "ABC11","ABC10", "AB1", "AB2", "Ab11"
};
I am trying to sort the above array as follows using LINQ
but I am not ge开发者_如何学Gotting the expected result.
var result = partNumbers.OrderBy(x => x);
Actual Result:
AB1
Ab11 AB2 ABC1 ABC10 ABC10 ABC11 ABC2
Expected Result
AB1
AB2 AB11 ..
That is because the default ordering for string is standard alpha numeric dictionary (lexicographic) ordering, and ABC11 will come before ABC2 because ordering always proceeds from left to right.
To get what you want, you need to pad the numeric portion in your order by clause, something like:
var result = partNumbers.OrderBy(x => PadNumbers(x));
where PadNumbers
could be defined as:
public static string PadNumbers(string input)
{
return Regex.Replace(input, "[0-9]+", match => match.Value.PadLeft(10, '0'));
}
This pads zeros for any number (or numbers) that appear in the input string so that OrderBy
sees:
ABC0000000010
ABC0000000001
...
AB0000000011
The padding only happens on the key used for comparison. The original strings (without padding) are preserved in the result.
Note that this approach assumes a maximum number of digits for numbers in the input.
If you want to sort a list of objects by a specific property using LINQ and a custom comparer like the one by Dave Koelle you would do something like this:
...
items = items.OrderBy(x => x.property, new AlphanumComparator()).ToList();
...
You also have to alter Dave's class to inherit from System.Collections.Generic.IComparer<object>
instead of the basic IComparer
so the class signature becomes:
...
public class AlphanumComparator : System.Collections.Generic.IComparer<object>
{
...
Personally, I prefer the implementation by James McCormack because it implements IDisposable, though my benchmarking shows that it is slightly slower.
You can use PInvoke to get fast and good result:
class AlphanumericComparer : IComparer<string>
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
static extern int StrCmpLogicalW(string s1, string s2);
public int Compare(string x, string y) => StrCmpLogicalW(x, y);
}
You can use it like AlphanumComparatorFast
from the answer above.
You can PInvoke to StrCmpLogicalW
(the windows function) to do this. See here: Natural Sort Order in C#
public class AlphanumComparatorFast : IComparer
{
List<string> GetList(string s1)
{
List<string> SB1 = new List<string>();
string st1, st2, st3;
st1 = "";
bool flag = char.IsDigit(s1[0]);
foreach (char c in s1)
{
if (flag != char.IsDigit(c) || c=='\'')
{
if(st1!="")
SB1.Add(st1);
st1 = "";
flag = char.IsDigit(c);
}
if (char.IsDigit(c))
{
st1 += c;
}
if (char.IsLetter(c))
{
st1 += c;
}
}
SB1.Add(st1);
return SB1;
}
public int Compare(object x, object y)
{
string s1 = x as string;
if (s1 == null)
{
return 0;
}
string s2 = y as string;
if (s2 == null)
{
return 0;
}
if (s1 == s2)
{
return 0;
}
int len1 = s1.Length;
int len2 = s2.Length;
int marker1 = 0;
int marker2 = 0;
// Walk through two the strings with two markers.
List<string> str1 = GetList(s1);
List<string> str2 = GetList(s2);
while (str1.Count != str2.Count)
{
if (str1.Count < str2.Count)
{
str1.Add("");
}
else
{
str2.Add("");
}
}
int x1 = 0; int res = 0; int x2 = 0; string y2 = "";
bool status = false;
string y1 = ""; bool s1Status = false; bool s2Status = false;
//s1status ==false then string ele int;
//s2status ==false then string ele int;
int result = 0;
for (int i = 0; i < str1.Count && i < str2.Count; i++)
{
status = int.TryParse(str1[i].ToString(), out res);
if (res == 0)
{
y1 = str1[i].ToString();
s1Status = false;
}
else
{
x1 = Convert.ToInt32(str1[i].ToString());
s1Status = true;
}
status = int.TryParse(str2[i].ToString(), out res);
if (res == 0)
{
y2 = str2[i].ToString();
s2Status = false;
}
else
{
x2 = Convert.ToInt32(str2[i].ToString());
s2Status = true;
}
//checking --the data comparision
if(!s2Status && !s1Status ) //both are strings
{
result = str1[i].CompareTo(str2[i]);
}
else if (s2Status && s1Status) //both are intergers
{
if (x1 == x2)
{
if (str1[i].ToString().Length < str2[i].ToString().Length)
{
result = 1;
}
else if (str1[i].ToString().Length > str2[i].ToString().Length)
result = -1;
else
result = 0;
}
else
{
int st1ZeroCount=str1[i].ToString().Trim().Length- str1[i].ToString().TrimStart(new char[]{'0'}).Length;
int st2ZeroCount = str2[i].ToString().Trim().Length - str2[i].ToString().TrimStart(new char[] { '0' }).Length;
if (st1ZeroCount > st2ZeroCount)
result = -1;
else if (st1ZeroCount < st2ZeroCount)
result = 1;
else
result = x1.CompareTo(x2);
}
}
else
{
result = str1[i].CompareTo(str2[i]);
}
if (result == 0)
{
continue;
}
else
break;
}
return result;
}
}
USAGE of this Class:
List<string> marks = new List<string>();
marks.Add("M'00Z1");
marks.Add("M'0A27");
marks.Add("M'00Z0");
marks.Add("0000A27");
marks.Add("100Z0");
string[] Markings = marks.ToArray();
Array.Sort(Markings, new AlphanumComparatorFast());
For those who likes a generic approach, adjust the AlphanumComparator to Dave Koelle : AlphanumComparator slightly.
Step one (I rename the class to non-abbreviated and taking a TCompareType generic type argument):
public class AlphanumericComparator<TCompareType> : IComparer<TCompareType>
The next adjustments is to import the following namespace:
using System.Collections.Generic;
And we change the signature of the Compare method from object to TCompareType:
public int Compare(TCompareType x, TCompareType y)
{ .... no further modifications
Now we can specify the right type for the AlphanumericComparator. (It should actually be called AlphanumericComparer I think), when we use it.
Example usage from my code:
if (result.SearchResults.Any()) {
result.SearchResults = result.SearchResults.OrderBy(item => item.Code, new AlphanumericComparator<string>()).ToList();
}
Now you have an alphanumeric comparator (comparer) that accepts generic arguments and can be used on different types.
And here is an extension method for using the comparator:
/// <summary>
/// Returns an ordered collection by key selector (property expression) using alpha numeric comparer
/// </summary>
/// <typeparam name="T">The item type in the ienumerable</typeparam>
/// <typeparam name="TKey">The type of the key selector (property to order by)</typeparam>
/// <param name="coll">The source ienumerable</param>
/// <param name="keySelector">The key selector, use a member expression in lambda expression</param>
/// <returns></returns>
public static IEnumerable<T> OrderByMember<T, TKey>(this IEnumerable<T> coll, Func<T, TKey> keySelector)
{
var result = coll.OrderBy(keySelector, new AlphanumericComparer<TKey>());
return result;
}
Well looks like its doing a Lexicographical Ordering irrespective to small or capital chars.
You can try using some custom expression in that lambda to do that.
There's no natural way to do this in .NET, but have a look at this blog post on natural sorting
You could put this into an extension method and use that instead of OrderBy
Looks like Dave Koelle's code link is dead. I got the last version from WebArchive.
/*
* The Alphanum Algorithm is an improved sorting algorithm for strings
* containing numbers. Instead of sorting numbers in ASCII order like
* a standard sort, this algorithm sorts numbers in numeric order.
*
* The Alphanum Algorithm is discussed at http://www.DaveKoelle.com
*
* Based on the Java implementation of Dave Koelle's Alphanum algorithm.
* Contributed by Jonathan Ruckwood <jonathan.ruckwood@gmail.com>
*
* Adapted by Dominik Hurnaus <dominik.hurnaus@gmail.com> to
* - correctly sort words where one word starts with another word
* - have slightly better performance
*
* Released under the MIT License - https://opensource.org/licenses/MIT
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
using System;
using System.Collections;
using System.Text;
/*
* Please compare against the latest Java version at http://www.DaveKoelle.com
* to see the most recent modifications
*/
namespace AlphanumComparator
{
public class AlphanumComparator : IComparer
{
private enum ChunkType {Alphanumeric, Numeric};
private bool InChunk(char ch, char otherCh)
{
ChunkType type = ChunkType.Alphanumeric;
if (char.IsDigit(otherCh))
{
type = ChunkType.Numeric;
}
if ((type == ChunkType.Alphanumeric && char.IsDigit(ch))
|| (type == ChunkType.Numeric && !char.IsDigit(ch)))
{
return false;
}
return true;
}
public int Compare(object x, object y)
{
String s1 = x as string;
String s2 = y as string;
if (s1 == null || s2 == null)
{
return 0;
}
int thisMarker = 0, thisNumericChunk = 0;
int thatMarker = 0, thatNumericChunk = 0;
while ((thisMarker < s1.Length) || (thatMarker < s2.Length))
{
if (thisMarker >= s1.Length)
{
return -1;
}
else if (thatMarker >= s2.Length)
{
return 1;
}
char thisCh = s1[thisMarker];
char thatCh = s2[thatMarker];
StringBuilder thisChunk = new StringBuilder();
StringBuilder thatChunk = new StringBuilder();
while ((thisMarker < s1.Length) && (thisChunk.Length==0 ||InChunk(thisCh, thisChunk[0])))
{
thisChunk.Append(thisCh);
thisMarker++;
if (thisMarker < s1.Length)
{
thisCh = s1[thisMarker];
}
}
while ((thatMarker < s2.Length) && (thatChunk.Length==0 ||InChunk(thatCh, thatChunk[0])))
{
thatChunk.Append(thatCh);
thatMarker++;
if (thatMarker < s2.Length)
{
thatCh = s2[thatMarker];
}
}
int result = 0;
// If both chunks contain numeric characters, sort them numerically
if (char.IsDigit(thisChunk[0]) && char.IsDigit(thatChunk[0]))
{
thisNumericChunk = Convert.ToInt32(thisChunk.ToString());
thatNumericChunk = Convert.ToInt32(thatChunk.ToString());
if (thisNumericChunk < thatNumericChunk)
{
result = -1;
}
if (thisNumericChunk > thatNumericChunk)
{
result = 1;
}
}
else
{
result = thisChunk.ToString().CompareTo(thatChunk.ToString());
}
if (result != 0)
{
return result;
}
}
return 0;
}
}
}
Since the number of characters at the beginning is variable, a regular expression would help:
var re = new Regex(@"\d+$"); // finds the consecutive digits at the end of the string
var result = partNumbers.OrderBy(x => int.Parse(re.Match(x).Value));
If there were a fixed number of prefix characters, then you could use the Substring
method to extract starting from the relevant characters:
// parses the string as a number starting from the 5th character
var result = partNumbers.OrderBy(x => int.Parse(x.Substring(4)));
If the numbers might contain a decimal separator or thousands separator, then the regular expression needs to allow those characters as well:
var re = new Regex(@"[\d,]*\.?\d+$");
var result = partNumbers.OrderBy(x => double.Parse(x.Substring(4)));
If the string returned by the regular expression or Substring
might be unparseable by int.Parse
/ double.Parse
, then use the relevant TryParse
variant:
var re = new Regex(@"\d+$"); // finds the consecutive digits at the end of the string
var result = partNumbers.OrderBy(x => {
int? parsed = null;
if (int.TryParse(re.Match(x).Value, out var temp)) {
parsed = temp;
}
return parsed;
});
Just extending @Nathan's answer here.
var maxStringLength = partNumbers.Max(x => x).Count();
var result = partNumbers.OrderBy(x => PadNumbers(x, maxStringLength));
Then pass the param to the PadNumbers function will be dynamic.
public static string PadNumbers(string input, int maxStringLength)
{
return Regex.Replace(input, "[0-9]+", match => match.Value.PadLeft(maxStringLength, '0'));
}
I don´t know how to do that in LINQ, but maybe you like this way to:
Array.Sort(partNumbers, new AlphanumComparatorFast());
// Display the results
foreach (string h in partNumbers )
{
Console.WriteLine(h);
}
精彩评论