I was wondering if somehow (maybe with an aglorithm) a submitted text like the one below can be summarized (removing the common words)
I have a unigram language model andi want to smooth the counts. Is add one smoothing the only way or can i use some other smoothing also. I dont think we can use knesser nay as that is for Ngrams with
I\'d like to ask questions about personalized search. I\'m about to design/implement a personalized search with Lucene. I did some googling about that, but didn\'t seem to find module/开发者_运维问答t
I\'m looking for a Java library that can do Named entity recognition (NER) with a custom controlled vocabulary, without needing labeled training data first. I searched some on SE, but most questions a
I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer开发者_开发问答).
I want to take what people chat about in a chat room and do the following information retrieval: Get the keywords
Question is: How to rank keywords that have been used in search queries in my web application based on time and number of search?
Imagine I have a huge database of threads and posts (about 10.000.000 records) from different forum sites including several subforums that serve as my lucene documents.
Closed. This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing th
I\'m using Java and Jsoup to parse HTML pages and I want to get all the divs that not contains another divs inside it to print the text it contains.