Is there an easy way, to save a Google Ngram result http://books.google.com/ngrams/ as a csv? So that I get a list like
I\'m trying to write an algorithm (which I\'m assuming will rely on natural language processing techniques) to \'fill out\' a list of search terms. There is probably a name for this kind of thing whic
I\'m using NLTK to search for n-grams in a corpus but it\'s taking a very long time in some cases. I\'ve noticed calculating n-grams isn\'t an uncommon feature in other packages (apparently Haystack h
I want to scan through a huge corpus of text and count word frequencies (n-gram frequencies actually for those who are familiar with NLP/IR). I use a Java HashMapfor this. So what happens is I process
I\'ve been working on a project to data-mine a large amount of short texts and categorize these based on a pre-existing large list of category names. To do this I had to figure out how to first create
What\'s the best way to extract keyphrases from a block of text? I\'m writing a tool to do keyword extraction: something like this. I\'ve found a few libraries for Python and Perl to extract n-grams,
I am trying to do some pattern \'mining\' in piece of multi word on each line. I have done the N-gram analysis using the Text::Ngrams module in perl which give me the frequency of each word . I am how
I am currently using what I (mistakenly) thought would be a fairly straightforward implementation of Solr\'s NGramTokenizerFactory, but I\'m getting strange results that are inconsistent between the a
I\'m trying to create an application which uses trigrams for approximate string matching. Now all the records are in the database and i want to be able to search the records on a fixed column. Is it b
I have an ARPA LM generated by kylm, when running SPHINX I get this exception stack trace: Exception in thread \"main\" java.lang.RuntimeException: Allocation of search manager resources failed