Hi, I download a large amount of files for data mining. I used to use PHP for this purpose but I am finding it to be too slow. Also I just want a small part of the web page. I want to achieve two thi
I am trying to figure out an optimal solution (in Java) to the following problem: In a first pass over some data, I count the number of occurrences of an item.Basically, I create a HashMap from item
I\'m building a system which will collect data about an industrial process, which is externally controlled. Those datas will be used to build usage statistics for various components of the system.
Say I have a huge (a few million) list of n vectors, given a new vec开发者_如何学运维tor, I need to find a pretty close one from the set but it doesn\'t need to be the closest. (Nearest Neighbor finds
I am making an application that organizes a set of documents (ranging in number from a minimum of ~10 documents to a maximum of ~2000) into groups, based开发者_C百科 on the word/phrase content of each
I would like to classify text documents into four categories. Also I have lot of samples which are already classified that c开发者_如何学Can be used for training. I would like the algorithm to learn o
There are three ways to measure i开发者_StackOverflowmpurity: What are the differences and appropriate use cases for each method?If the p_i\'s are very small, then doing multiplication on very
Given a vector of scores 开发者_开发知识库and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?
I need a review for my solution for sampling 100 random rows from a table stored on a MPP machine (currently Netezza , later might be hadoop/etc.)
I\'m looking for a way to find common phrases within a body of text using PHP. If it\'s not possible in php, I\'d be interested in other web languages that would help me complete this.