开发者

find words in a hashset or treeset?

I am piping in a file and storing it into a treeset. I am trying to count unique words.. I am placing words that i dont want into a hashset. "a","the", "and"

I want to check to see if the file contains those words, before i place them into the TreeSet.. i know i need some sort of if(word == find) ? i just dont know how to do it..

Sorry about formatting. its hard to get it correct after you paste.

this is what i have..

import java.util.Scanner;
import java.util.ArrayList;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.HashSet;

public class Project1
{
    public static void main(String[] args)
    {
        Scanner     sc = new Scanner(System.in);    
        String      word;
        String grab;
        int count = 0;
        int count2 =0;
        int count3 =0;
        int count4 =0;
        int number;
        TreeSet<String> a = new TreeSet<String>();
        HashSet<String> find = new HashSet<String>();

        System.out.println("Project 1\n");
        find.add("a");
        find.add("and");
        find.add("the");

        while (sc.hasNext()) 
        {   
            word = sc.next();
            word = word.toLowerCase();
            for(int i = 0; i < word.length(); i++ )
            {
                if(Character.isDigit(word.charAt(i))) 
                {
                    count3++;  
                }
            }
            //if( a.contains("a") )
            //|| word.matches("and") || word.matches("the")|| word.contains("$"))
            //{
     开发者_运维百科       //   count2++;
            // }
            a.add(word);
            if (word.equals("---"))
            {
                break;
            }
        }

        System.out.println("a size");
        System.out.println(a.size());
        // count = count2 - count;
        System.out.println("unique words");
        System.out.println(a.size() -  count2 - count3);
        System.out.println("\nbye...");
    }
}


I see you're using SO for the whole project.

You can do something along the lines of:

if(!find.contains(word)){
    //addTheWord
}


It's somewhat tangential to your question, but it's never too soon to learn to code to the interface. In your example,

TreeSet<String> a = new TreeSet<String>();
HashSet<String> find = new HashSet<String>();

might be better as

Set<String> uniqueWords = new TreeSet<String>();
Set<String> trivialWords = new HashSet<String>();

Using the interface type puts the focus on the Set functionality of the two collections. It also allows you choose a different implementation easily at a later time, as your program evolves. Descriptive names are a good habit, too.


To look up for an element: HashSet: using contains() requires O(c) - constant-time TreeSet: using contains() requires O(log n) - log n >> c (depends)

If the natural order of elements is required often for lookup, use TreeSet. Otherwise, use HashSet.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜