word count frequency in document
I have a directory in which I have 1000 txt.files in it. I want to know for every word how many times it occurs in the 1000 document. So say even the word "cow" occured 100 times in X it will still be counted as one. If it occured in a different document it is incremented by one. So the maximum is 1000 if "cow" appears in every single document. How do I do this the easy way without the use of any other external library. Here's what I have so far
private Hashtable<String, Integer> getAllWordCount()
private Hashtable<String, Integer> getAllWordCount()
{
Hashtable<String, Integer> result = new Hashtable<String, Integer>();
HashSet<String> words = new HashSet<String>();
try {
for (int j = 0; j < fileDirectory.length; j++){
File theDirectory = new File(fileDirectory[j]);
File[] children = theDirectory.listFiles();
for (int i = 0; i < children.length; i++){
Scanner scanner = new Scanner(new FileReader(children[i]));
while (scanner.hasNext()){
String text = scanner.next().replaceAll("[^A-Za-z0-9]", "");
if (words.contains(text) == false){
if (result.get(text) == null)
result.put(text, 1);
else
result.put(text, result.get(text) + 1);
words.add(text);
}
}
}
words.clear();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(result.size());
return res开发者_StackOverflow中文版ult;
}
You also need a HashSet<String>
in which you store each unique word you've read from the current file.
Then after every word read, you should check if it's in the set, if it isn't, increment the corresponding value in the result
map (or add a new entry if it was empty, like you already do) and add the word to the set.
Don't forget to reset the set when you start to read a new file though.
how about this?
private Hashtable<String, Integer> getAllWordCount()
{
Hashtable<String, Integer> result = new Hashtable<String, Integer>();
HashSet<String> words = new HashSet<String>();
try {
for (int j = 0; j < fileDirectory.length; j++){
File theDirectory = new File(fileDirectory[j]);
File[] children = theDirectory.listFiles();
for (int i = 0; i < children.length; i++){
Scanner scanner = new Scanner(new FileReader(children[i]));
while (scanner.hasNext()){
String text = scanner.next().replaceAll("[^A-Za-z0-9]", "");
words.add(text);
}
for (String word : words) {
Integer count = result.get(word)
if (result.get(word) == null) {
result.put(word, 1);
} else {
result.put(word, result.get(word) + 1);
}
}
words.clear();
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(result.size());
return result;
}
精彩评论