I am particularly interested how one can deal with a huge amount of information for a commercial service like Google Search or Google Maps. We all know they use (or \"did\" at least) a kind of Linux c
I am trying to use IDF scores to find interesting phrases in my pretty huge corpus of documents. I basically need something like Amazon\'s Statistically Improbable Phrases, i.e. phrases that distingui
I need to write a program to scrape forums. Should I write the program in Python using the Scrapy framework or shou开发者_开发百科ld I use Php cURL?
I\'ve to create a dataset from some text files, writing them as vectors of features. Something like this:
Do you know of any existing implementation in any language (preferably python) of any entity set expansion algorithms, such that the one from Google sets ? ( http://labs.google.com/sets )
I\'m doing a university project, that must gather and combine data on a user provided topic. The problem I\'ve encountered is that Google search results for many terms are polluted with low quality au
Are there any tools or tricks how to automatically extract tables from pdfs. Are there any C# libraries that could do that? Or do you maybe know other methods how this could be handled?
I need to retrieve some info from web. For example, I can visit weather.com to search my zip code to get H开发者_如何学GoTML file that contains the temperature or something. I need to make a python sc
I\'m looking for some documentation on how Information Retrieval systems (e.g., Lucene) store their indexes for speedy \"relevancy\" lookups.My Google-fu is failing me: I\'ve found a page which descri
It\'s part of an information retrieval thing I\'m doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters sav