What it is the best performance way to grab duplicates from a large Set<String>?
I have a large Set<String>
that contains many words, say:
"aaa, cCc, dDD, AAA, bbB, BBB, AaA, CCc, ..."
I want to group all duplicate words from the Set ignoring the case sensitivity of the words then save them in a Vector<Vector<String>>
or whateve开发者_JAVA百科r, so each Vector<String>
item will contain the group of similar words, like this :
Vector<String>
: aaa, AAA, AaA, ...
Vector<String>
: cCc, CCc, ...
Vector<String>
: bbB, BBB, ...
I care about the performance as this Set contain many words.
If you truly care about performance you would not use Vector
. As for the sorting problem one solution would be to use the TreeMap
or TreeSet
object and create a Comparator
that does the equality (sorting) you want.
The instantiation could be:
new TreeMap<String,LinkedList<String>>(new Comparator<String>() {
// comparator here
});
Usage:
LinkedList<String> entry = map.get(nextKey);
if (entry == null) {
entry = new LinkedList<String>()
map.put(nextKey, entry);
}
entry.add(nextKey);
I would create a HashMap<String, Vector<String>> hashMap
.
Next, for each 'string' in your set
if (!hashMap.containsKey(string.toLowerCase()){
Vector v = new Vector();
v.add(string);
hashMap.put(string.toLowerCase(), v);
} else {
hashMap.get(string.toLowerCase()).add(string);
}
At the end, create a Vector of vectors if needed, or work with the hashmap.valueSet()
If you can choose Set
implementation you can use TreeSet
with Comparator
that compares strings ignoring case. Then you will be able to iterate over sorted list and easily group duplicates.
This iterates over the input set once and I doubt you can get much faster than that. Swapping the ArrayList
s for LinkedLists
may trade locality for less copying, which may be an performance gain, but I doubt it. Here's the code:
Set<String> input = new HashSet<String>(Arrays.asList(
"aaa", "cCc", "dDD", "AAA", "bbB", "BBB", "AaA", "CCc"));
Map<String, List<String>> tmp = new HashMap<String, List<String>>();
for (String s : input) {
String low = s.toLowerCase();
List<String> l = tmp.get(low);
if (l == null) {
l = new ArrayList<String>();
tmp.put(low, l);
}
l.add(s);
}
final List<List<String>> result = new ArrayList<List<String>>(tmp.values());
精彩评论