开发者

sorting collections with string elements in java

I have a collections...i wrote code to sort values using my own comparator my comparator code is

private static class MatchComparator implements Comparator<xmlparse> {
    @Override
    public int compare(xmlparse object1, xmlparse object2) {
        St开发者_开发技巧ring match1 = object1.getMatchId();
        String match2 = object2.getMatchId();
        return match1.compareTo(match2);
    }
}

I will call Collections.sort(list,new MatchComparator());

Everything is fine but my problem is the sorted list is wrong when i print it...

Input for list

Match19
Match7
Match12
Match46
Match32

output from the sorted list

Match12
Match19
Match32
Match46
Match7

my expected output is

Match7
Match12
Match19
Match32
Match46


to get the order you need, you could either prefix the 1 digit numbers with zero ( eg Match07 ) or you have to split the string in a prefix and a numeric part, and implement the sorting as numeric comparison


The problem is that String.compareTo(..) compares the words char by char.

If all string start with Match, then you can easily fix this with:

public int compare(xmlparse object1, xmlparse object2) {
    String match1 = object1.getMatchId();
    String match2 = object2.getMatchId();
    return Integer.parseInt(match1.replace("Match"))
         - Integer.parseInt(match2.replace("Match"));
}

In case they don't start all with Match, then you can use regex:

Integer.parseInt(object1.replaceAll("[a-zA-Z]+", ""));

Or

Integer.parseInt(object1.replaceAll("[\p{Alpha}\p{Punch}]+", ""));

And a final note - name your classes with uppercase, camelCase - i.e. XmlParse instead of xmlparse - that's what the convention dictates.


Implement a function and use it for comparison:
instead of

return match1.compareTo(match2);

use

return compareNatural(match1,match2);

Here is a function which does a natural comparison on strings:

private static int compareNatural(String s, String t, boolean caseSensitive) {
    int sIndex = 0;
    int tIndex = 0;

    int sLength = s.length();
    int tLength = t.length();

    while (true) {
        // both character indices are after a subword (or at zero)

        // Check if one string is at end
        if (sIndex == sLength && tIndex == tLength) {
            return 0;
        }
        if (sIndex == sLength) {
            return -1;
        }
        if (tIndex == tLength) {
            return 1;
        }

        // Compare sub word
        char sChar = s.charAt(sIndex);
        char tChar = t.charAt(tIndex);

        boolean sCharIsDigit = Character.isDigit(sChar);
        boolean tCharIsDigit = Character.isDigit(tChar);

        if (sCharIsDigit && tCharIsDigit) {
            // Compare numbers

            // skip leading 0s
            int sLeadingZeroCount = 0;
            while (sChar == '0') {
                ++sLeadingZeroCount;
                ++sIndex;
                if (sIndex == sLength) {
                    break;
                }
                sChar = s.charAt(sIndex);
            }
            int tLeadingZeroCount = 0;
            while (tChar == '0') {
                ++tLeadingZeroCount;
                ++tIndex;
                if (tIndex == tLength) {
                    break;
                }
                tChar = t.charAt(tIndex);
            }
            boolean sAllZero = sIndex == sLength || !Character.isDigit(sChar);
            boolean tAllZero = tIndex == tLength || !Character.isDigit(tChar);
            if (sAllZero && tAllZero) {
                continue;
            }
            if (sAllZero && !tAllZero) {
                return -1;
            }
            if (tAllZero) {
                return 1;
            }

            int diff = 0;
            do {
                if (diff == 0) {
                    diff = sChar - tChar;
                }
                ++sIndex;
                ++tIndex;
                if (sIndex == sLength && tIndex == tLength) {
                    return diff != 0 ? diff : sLeadingZeroCount - tLeadingZeroCount;
                }
                if (sIndex == sLength) {
                    if (diff == 0) {
                        return -1;
                    }
                    return Character.isDigit(t.charAt(tIndex)) ? -1 : diff;
                }
                if (tIndex == tLength) {
                    if (diff == 0) {
                        return 1;
                    }
                    return Character.isDigit(s.charAt(sIndex)) ? 1 : diff;
                }
                sChar = s.charAt(sIndex);
                tChar = t.charAt(tIndex);
                sCharIsDigit = Character.isDigit(sChar);
                tCharIsDigit = Character.isDigit(tChar);
                if (!sCharIsDigit && !tCharIsDigit) {
                    // both number sub words have the same length
                    if (diff != 0) {
                        return diff;
                    }
                    break;
                }
                if (!sCharIsDigit) {
                    return -1;
                }
                if (!tCharIsDigit) {
                    return 1;
                }
            } while (true);
        } else {
            // Compare words
            // No collator specified. All characters should be ascii only. Compare character-by-character.
            do {
                if (sChar != tChar) {
                    if (caseSensitive) {
                        return sChar - tChar;
                    }
                    sChar = Character.toUpperCase(sChar);
                    tChar = Character.toUpperCase(tChar);
                    if (sChar != tChar) {
                        sChar = Character.toLowerCase(sChar);
                        tChar = Character.toLowerCase(tChar);
                        if (sChar != tChar) {
                            return sChar - tChar;
                        }
                    }
                }
                ++sIndex;
                ++tIndex;
                if (sIndex == sLength && tIndex == tLength) {
                    return 0;
                }
                if (sIndex == sLength) {
                    return -1;
                }
                if (tIndex == tLength) {
                    return 1;
                }
                sChar = s.charAt(sIndex);
                tChar = t.charAt(tIndex);
                sCharIsDigit = Character.isDigit(sChar);
                tCharIsDigit = Character.isDigit(tChar);
            } while (!sCharIsDigit && !tCharIsDigit);
        }
    }
}

a better one is here


The comparison is lexicographic and not numerical, that's your problem. In lexicoraphic ordering, 10 comes before 9.

See this question for open source implementation solutions. You can also implement your own string comparison, which shouldn't be that hard.


It seems that you expect not what the String.compareTo() really is. It performs so called lexicographical comarsion, but you try to compare it by number. You need to modify the code of your comparator.

  @Override
        public int compare(xmlparse object1, xmlparse object2) {
            String match1 = object1.getMatchId();
            String match2 = object2.getMatchId();

            Long n1 = getNumber(match1);
            Long n2 = getNumber(match2);

            return n1.compareTo(n2);             
        }

where getNumber() extracts the last nuber from string "matchXX"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜