开发者

dimension reduction of matrix TFIDF

I calculate the TFIdf(term frequency,inverse document frequency) and i have seen that after this step it is necessary to reduce the dimension of My Matrix with using methods like LSI ,chi -square test...,

I haven't any idea how i can implement chi square test in java for dimensionality reduction of matrix TFIDF,if there is s开发者_StackOverflow社区ome library to do this or tutorial in which they explain how i can do this, tell me please


use gensims library for LSA, LDA. It can practically perform LSA for any large dataset. It does not load the entire corpus into memory at once but does a lazy read.


I don't think you want to do chi-square; that's not a technique for dimension reduction.

What you want to do is SVD, or singular value decomposition. That is the technique used in LSI/LSA for dimensionality reduction.

Wikipedia suggests using a library called 'S-Space Pacakage' for LSA in Java. I haven't used it myself, but you may want to look into it.

http://code.google.com/p/airhead-research/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜