开发者

Book and article references sought for starting out with document classification

I am interested in doing a project on document classification and have been looking for books that could be useful for the theoretical parts in text mining re开发者_JAVA技巧lated to this or examples of articles describing the process of going from training data with documents classified (with subcategories) to a system which predicts the class of a document. There seem to be some (rather expensive!) titles available but these are conference proceedings with articles on smaller very specific topics. Can someone suggest books from the data mining literature that provides a good theoretical basis for a project on text mining, specifically document classification or articles with an overview of this process ?


Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze have a free information retrieval book. Try chapter 13 - Text classification & Naive Bayes.

See also the companion site for Manning and Schütze's nlp book, specifically links for the text categorization chapter.

Fabrizio Sebastiani wrote a useful tutorial about text categorization(PDF) and review paper of machine learning for text categorization (PDF).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜