Represent a document to a vector by Lucene.
I want to build document vector for SVM text categorization. I have indexed m开发者_JAVA百科y documents to 2 POSITIVE and NEGATIVE documents. And I selected my features space with IG method.
How can I represent a documents become a vector with tf-idf weight term by Lucene.
Thanks !
Best regard!
Apache Mahout is a machine learning library in Java. It has utilities to create document vectors from lucene index (created from raw text). You can adopt the code as per your requirement.
精彩评论