find words in a hashset or treeset?
I am piping in a file and storing it into a treeset. I am trying to count unique words.. I am placing words that i dont want into a hashset. "a","the", "and"
I want to check to see if the file contains those words, before i place them into the TreeSet.. i know i need some sort of if(word == find) ? i just dont know how to do it..
Sorry about formatting. its hard to get it correct after you paste.
this is what i have..
import java.util.Scanner;
import java.util.ArrayList;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.HashSet;
public class Project1
{
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
String word;
String grab;
int count = 0;
int count2 =0;
int count3 =0;
int count4 =0;
int number;
TreeSet<String> a = new TreeSet<String>();
HashSet<String> find = new HashSet<String>();
System.out.println("Project 1\n");
find.add("a");
find.add("and");
find.add("the");
while (sc.hasNext())
{
word = sc.next();
word = word.toLowerCase();
for(int i = 0; i < word.length(); i++ )
{
if(Character.isDigit(word.charAt(i)))
{
count3++;
}
}
//if( a.contains("a") )
//|| word.matches("and") || word.matches("the")|| word.contains("$"))
//{
开发者_运维百科 // count2++;
// }
a.add(word);
if (word.equals("---"))
{
break;
}
}
System.out.println("a size");
System.out.println(a.size());
// count = count2 - count;
System.out.println("unique words");
System.out.println(a.size() - count2 - count3);
System.out.println("\nbye...");
}
}
I see you're using SO for the whole project.
You can do something along the lines of:
if(!find.contains(word)){
//addTheWord
}
It's somewhat tangential to your question, but it's never too soon to learn to code to the interface. In your example,
TreeSet<String> a = new TreeSet<String>();
HashSet<String> find = new HashSet<String>();
might be better as
Set<String> uniqueWords = new TreeSet<String>();
Set<String> trivialWords = new HashSet<String>();
Using the interface type puts the focus on the Set
functionality of the two collections. It also allows you choose a different implementation easily at a later time, as your program evolves. Descriptive names are a good habit, too.
To look up for an element: HashSet: using contains() requires O(c) - constant-time TreeSet: using contains() requires O(log n) - log n >> c (depends)
If the natural order of elements is required often for lookup, use TreeSet. Otherwise, use HashSet.
精彩评论