开发者

machine learning predict classification

I have the following problem. I have a training开发者_如何学C dataset comprising of a range of numbers. Each number belongs to a certain class. There are five classes.

Range: 1...10

Training Dataset: {1,5,6,6,10,2,3,4,1,8,6,...}

Classes: [1,2][3,4][5,6][7,8][9,10]

Is it possible to use a machine learning algorithm to find likelihoods for class prediction and what algorithm would be suited for this?

best, US


As described in the question's comment,
I want to calculate the likelihood of a certain class to appear based on the given distribution of the training set,
the problem is trivial and hardly a machine learning one:
Simply count the number of occurrences of each class in the "training set", Count_12, Count_34, ... Count_910. The likelihood that a given class xy would appear is simply given by

   P(xy) = Count_xy  / Total Number of elements in the "training set"
         = Count_xy  / (Count_12 + Count_34 + Count_56 + Count_78 + Count_910)

A more interesting problem...
...would be to consider the training set as a sequence and to guess what would the next item in that sequence be. The probability that the next item be from a given category would then not only be based on the prior for that category (the P(xy) computed above), but it would also be taking into account the items which precede it in the sequence. One of the interesting parts of this problem would then be to figure out how "far back" to look and much "weight" to give to the preceding sequences of items.

Edit (now that OP indicated his/her interest for the "more interesting problem").
This "prediction-given-preceding-sequence" problem maps almost directly to the
machine-learning-algorithm-for-predicting-order-of-events StackOverflow question.
The slight differences being that the alphabet here has 10 distinct code (4 in the other question) and the fact that here we try and predict a class of codes, rather that just the code itself. With regards to this aggregation of, here, 2 codes per class, we have several options:

  • work with classes from the start, i.e. replace each code read in the sequence by its class, and only consider and keep track of classes then on.
  • work with codes only, i.e. create a predictor of 1-thru-10 codes, and only consider the class at the very end, adding the probability of the two codes which comprise a class, to produce the likelihood of the next item being of that class.
  • some hybrid solution: consider / work with the codes but sometimes aggregate to the class.

My personal choice would be to try first with the code predictor (only aggregating at the very end), and maybe adapt from there if somehow insight gained from this initial attempt were to tell us that the logic or its performance could be simplified or improved would we aggregate earlier. Indeed the very same predictor could be used to try both approaches, one would simply need to alter the input stream, replacing all even numbers by the odd number preceding it. I'm guessing that valuable information (for the purpose of guessing upcoming codes) is lost when we aggregate early.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜