I was wondering if there is any chance of R\'s text mining package having the following feature: myCorpus <- Corpus(DirSource(<directory-contatining-textfiles>),control=...)
What are the statistical engines that yield bette开发者_JAVA技巧r results than the OpenNLP suite of tools, if any? What I\'m looking for is an engine that picks keywords from texts and provides stemmi
I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering.
Are there any libraries/toolkits that would help me in the task of extracting postal address information from unstructured PDF documents (e.g. letters)? If not, how would开发者_开发问答 you approach t
Is there any package for R that allows queryi开发者_运维技巧ng Wikipedia (most probably using Mediawiki API) to get list of available articles relevant to such query, as well as import selected articl
I\'m looking for a method to extract a menu used for navigation from a web page heavy with links (and probably text). The pages I\'m interested in are quite plain, valid XHTML, and it\'s a safe assump
I\'m developing a NER system on Mallet using CRFs. Do you know if it is possible to collect the features contribution for each prediction?
I am implementing Naive Bayes algorithm for text classification. I have ~1000 documents for training and 400 documents for testing. I think I\'ve implemented training part correctly, but I am confused
I\'m building a site that allows users to make sense of a debate by graphically representing arguments for and against a particular issue. (Wrangl)
I am interested in doing a project on document classification and have been looking for books that could be useful for the theoretical parts in text mining re开发者_JAVA技巧lated to this or examples o