开发者

Lucene. How to build a term-doc matrix

I need to build that matrix but I can't find a way to compute normalized tf-idf for each cell. The norma开发者_Python百科lization I would perform is cosine-normalization that is divide tf-idf (computed using DefaultSimilarity ) per 1/sqrt(sumOfSquaredtf-idf in the column).

Does anyone know a way to perform that?

Thanks in advance

Antonio


One way, not using Lucene, is described in Sujit Pal's blog. Alternatively, you can build a Lucene index that has term vectors per field, iterate over terms to get idf, then iterate over term's documents to get tf.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜