Java: String: equalsIgnoreCase vs switching everything to Upper/Lower Case
It came to my attention that there a several ways to compare strings in Java.
I just got in the habit ages ago to use equalsIgnoreCase
to avoid having problems with case sensitive strings.
Others on the other hand prefer passing everything in upper or lower case.
From where I stand (even if technically I'm sitting), I don't see a real difference.
Does anybody know if one practice is better than the other开发者_JAVA技巧? And if so why?
Use equalsIgnoreCase
because it's more readable than converting both Strings to uppercase before a comparison. Readability trumps micro-optimization.
What's more readable?
if (myString.toUpperCase().equals(myOtherString.toUpperCase())) {
or
if (myString.equalsIgnoreCase(myOtherString)) {
I think we can all agree that equalsIgnoreCase
is more readable.
equalsIgnoreCase avoids problems regarding Locale-specific differences (e.g. in Turkish Locale there are two different uppercase "i" letters). On the other hand, Maps only use the equals() method.
But the issue in the latter, where you make an assumption that either upper or lower case is passed, you cannot blindly trust the caller. So you have to include an ASSERT
statement at the start of the method to make sure that the input is always in the case your are expecting.
Neither is better, they both have their uses in different scenarios.
Many times when you have to do string comparisons there is the opportunity to massage at least one of the strings to make it easier to compare, and in these cases you will see strings converted to a particular case, trimmed, etc before being compared.
If, on the other hand, you just want to do an on-the-fly case-insensitive comparison of two strings then feel free to use equalsIgnoreCase
, that's what its there for after all. I would caution, however, that if you're seeing a lot of equalsIgnoreCase
it could be a code smell.
It depends on the use case.
If you're doing a one to one string comparison, equalsIgnoreCase is probably faster, since internally it just uppercases each character as it iterates through the strings (below code is from java.lang.String), which is slightly faster than uppercasing or lowercasing them all before performing the same comparison:
if (ignoreCase)
{
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
}
But when you have a situation where you want to do lookups against a data structure full of strings (especially strings that are all in the US Latin/ASCII space) in a case insensitive manner, it will be quicker to trim/lowercase the strings to be checked against and put them in something like a HashSet or HashMap.
This is better than calling equalsIgnoreCase on each element of a List because the slight performance gain of equalsIgnoreCase() is canceled out by the fact that you're basically doing a modified version of contains() against an array, which is O(n). With a pre-normalized string you can check against the entire list of strings with a single contains() call that runs in O(1).
equalsIgnoreCase documentation in jdk 8
Compares this String to another String, ignoring case considerations. Two strings are considered equal ignoring case if they are of the same length and corresponding characters in the two strings are equal ignoring case.
Two characters c1 and c2 are considered the same ignoring case if at least one of the following is true:
- The two characters are the same (as compared by the == operator)
- Applying the method java.lang.CharactertoUpperCase(char)to each character produces the same result
- Applying the method java.lang.CharactertoLowerCase(char) to each character produces the same result
My thoughts:
So using equalsIgnoreCase we iterate through the Strings (only if their size values are the same) comparing each char. In the worst case, we will performance will be O( 3cn ) where n = the size of your strings. We will use no extra space.
Using toUpper() then comparing if the strings are equal, you ALWAYS loop through each string one time, converting all strings to upper, then do an equivalence by reference check (equals()). This is theta(2n + c). But just remember, when you do toUpperCase(), you actually have to create two new Strings because Strings in Java are immutable.
So I would say that equalsIgnoreCase is both more efficient and easier to read.
Again I would consider the use case, because that would be what it comes down to for me. The toUpper approach could be valid in certain use cases, but 98% of the time I use equalsIgnoreCase().
Performance wise both are same according to this post:
http://www.params.me/2011/03/stringtolowercasestringtouppercase-vs.html
So I would decide based on code readabilty, in some case toLowerCase() would be better if I am passing a value always to a single method to create objects, otherwise equalsIgnoreCase() makes more sense.
When I'm working with English-only characters, I always run toUpperCase()
or toLowerCase()
before I start doing comparisons if I'm calling .equalsIgnoreCase()
more than once or if I'm using a switch
statement. This way it does the case-change operation only once, and so is more efficient.
For example, in a factory pattern:
public static SuperObject objectFactory(String objectName) {
switch(objectName.toUpperCase()) {
case "OBJECT1":
return new SubObject1();
break;
case "OBJECT2":
return new SubObject2();
break;
case "OBJECT3":
return new SubObject3();
break;
}
return null;
}
(Using a switch
statement is slightly faster than if..else if..else
blocks for String comparison)
精彩评论