How to calculate IDF?
Thank you guys on this website you helped in TF/IDF. It helped me alot to make tf-idf function in java. I made tf but I have one question. As on wiki they wrote IDF can be calculated that how many documents have the term. But I am confused.
For example, Here is the string "JosAH is great. JoshAH rocks" so the TF would be 2/5 and for IDF there are 2 documents 开发者_运维百科and each documents contain JoshAH term. So Will we just see if that term occur in other documents or we will see how many times it occurs in other documents?
I'm not entirely sure what you ask here. Anyway, the purpose of IDF --- inverse document frequency --- is to dampen the score of very frequent terms, and boost the score of infrequent terms.
In your collection of two documents, the IDF of "JosAH" will be 0 --- since it occurs in all documents.
The document frequency is 'the number of documents in the collection that contain a term' (from Introduction to Information Retrieval), so in your words the former option, 'just see if that term occurs'.
精彩评论