开发者

Remove stop words in Java --- Help needed

Im using a method to remove stop word defined in a file, that will rip off those words from the query stri开发者_如何学编程ng that i pass to this method... The code is working fine

Now what i need to do is ... If the query string contains just those stop words alone then it should not be ripped of..

eg : if the stopwords file has "is" "was" "and"

if the query is "I was a student" then the output should be " I a student"

but if the query is "and is " now i need the output the same as "and is".

Below is the method that i wrote to remove stop words.

public static String removeStopWords(String  query) throws UnsupportedEncodingException
    {
      String []queryTerms = query.split("&");
      String queryString="";
      StringBuffer sb =new StringBuffer();
      for (int i=0;i<queryTerms.length;i++){
            if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
                queryString = queryTerms[i].replaceAll("q=","").trim().replace("+"," ").replaceAll("\\s+"," ").trim();
                }

        }
      if(!queryString.equalsIgnoreCase("")) {
      String [] tokens=queryString.split("\\s+");
      List lStopWords=StopWordDataLoad.getlQueryStringStopword();
      List<String> lTokens=new ArrayList<String>();
      boolean noStopWord=false;
      for(String s: tokens)
        if(!lStopWords.contains(s)) {
              if(sb.length()==0) sb.append(s);
                  else sb.append(" ").append(s);
          } else noStopWord=true;

       queryString=sb.toString().replaceAll("\\s+", " ");
       if(queryString.equalsIgnoreCase("") || noStopWord ==false) return query;
      }
      else return query;


      String fque="";
      String finQue = "";
      ArrayList<String> list = new ArrayList<String>();
      for (int i=0;i<queryTerms.length;i++){
          if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
              fque = "q="+URLEncoder.encode(queryString,PropertyLoader.getHttpEncoding());
              list.add(fque);

          } else if (!queryTerms[i].equalsIgnoreCase("")) list.add(queryTerms[i]);
      }
      ListIterator<String> iter = list.listIterator();
        while(iter.hasNext()){
            String str = iter.next();
            finQue=finQue+"&"+str;
        }


      return finQue.trim();

    }


Just change the last line to this:

String result = finQue.trim();
if (result.equals("")) {
    return query;
} else {
    return result;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜