开发者

how to remove stop words in english using java program

How to remove stop words in english using java program. Please help me 开发者_JS百科with simplest program or suggest me some ideas. Thanks in advance


You can use a regex. Here's some nice tutorials.


What exactly do you mean by stop words? Maybe the replaceAll method will do the trick.


public static String removeStopWords(String  query) throws UnsupportedEncodingException
{
  String[] queryTerms = query.split("&");
  String queryString="";
  StringBuffer sb =new StringBuffer();
  for (int i=0;i<queryTerms.length;i++) {
    if (queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")) {
      queryString = queryTerms[i].replaceAll("q=","").trim().replace("+"," ").replaceAll("\\s+"," ").trim();
    }
  }
  if(!queryString.equalsIgnoreCase("")) {
    String[] tokens=queryString.split("\\s+");
    List lStopWords=StopWordDataLoad.getlQueryStringStopword();
    List<String> lTokens=new ArrayList<String>();
    boolean noStopWord=false;
    for(String s: tokens)
      if (!lStopWords.contains(s)) {
        if(sb.length()==0) sb.append(s);
          else sb.append(" ").append(s);
      } else noStopWord=true;

    queryString=sb.toString().replaceAll("\\s+", " ");
    if(queryString.equalsIgnoreCase("") || noStopWord ==false) return query;
  }
  else return query;

  String fque="";
  String finQue = "";
  ArrayList<String> list = new ArrayList<String>();
  for (int i=0;i<queryTerms.length;i++){
    if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
      fque = "q="+URLEncoder.encode(queryString,PropertyLoader.getHttpEncoding());
      list.add(fque);
    } else if (!queryTerms[i].equalsIgnoreCase("")) list.add(queryTerms[i]);
  }
  ListIterator<String> iter = list.listIterator();
  while(iter.hasNext()) {
      String str = iter.next();
      finQue=finQue+"&"+str;
  }

  return finQue.trim();
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜