How to count duplicates in an array of strings?
How do I partition a String to extract all the words/terms that occur i开发者_高级运维n it and count how many times each occurs?
For example let:
String q = "foo bar foo"
I want a DS {<foo,2>, <bar,1>}
. This is the least verbose code I code come with*. Faults or less verbose alternatives?
String[] split = q.toString().split("\\s");
Map<String, Integer> terms = new HashMap<String, Integer>();
for (String term : split) {
if(terms.containsKey(term)){
terms.put(term, terms.get(term)+1);
}
}
(haven't compiled it)
Modified code:
String[] split = q.toString().split("\\s");
Map<String, Integer> terms = new HashMap<String, Integer>();
for (String term : split) {
int score = 0;
if(terms.containsKey(term)){
score = terms.get(term);
}
terms.put(term, score +1);
}
PS: Untested.
I would go with the code suggested by Elite Gentleman, but I'm just putting this as a discussion point: What about using StringTokenizer? If scalability/performance was an issue, would tokenizer perform better? You may have to loop throught the string only once in that case, as opposed to doing the regex split first and then another traverse through the array.
Something like this:
StringTokenizer st = new StringTokenizer(s);
HashMap<String, Integer> terms = new HashMap<String, Integer>();
while (st.hasMoreElements()) {
String term = st.nextToken();
int score = 0;
if(terms.containsKey(term)){
score = terms.get(term);
}
terms.put(term, score +1);
}
I know that StringTokenizer, thought not deprecated, is a Legacy class according to java docs and it's use is not recommended:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
However I wonder if in this case for a simple token it gives more performant results.
Any thoughts?
精彩评论