machine learning predict classification

2023-03-19 19:17 问答作者：

I have the following problem. I have a training开发者_如何学C dataset comprising of a range of numbers. Each number belongs to a certain class. There are five classes.

Range: 1...10

Training Dataset: {1,5,6,6,10,2,3,4,1,8,6,...}

Classes: [1,2][3,4][5,6][7,8][9,10]

Is it possible to use a machine learning algorithm to find likelihoods for class prediction and what algorithm would be suited for this?

best, US

As described in the question's comment,
I want to calculate the likelihood of a certain class to appear based on the given distribution of the training set,
the problem is trivial and hardly a machine learning one:
Simply count the number of occurrences of each class in the "training set", Count_12, Count_34, ... Count_910. The likelihood that a given class xy would appear is simply given by

   P(xy) = Count_xy  / Total Number of elements in the "training set"
         = Count_xy  / (Count_12 + Count_34 + Count_56 + Count_78 + Count_910)

A more interesting problem...
...would be to consider the training set as a sequence and to guess what would the next item in that sequence be. The probability that the next item be from a given category would then not only be based on the prior for that category (the P(xy) computed above), but it would also be taking into account the items which precede it in the sequence. One of the interesting parts of this problem would then be to figure out how "far back" to look and much "weight" to give to the preceding sequences of items.

Edit (now that OP indicated his/her interest for the "more interesting problem").
This "prediction-given-preceding-sequence" problem maps almost directly to the
machine-learning-algorithm-for-predicting-order-of-events StackOverflow question.
The slight differences being that the alphabet here has 10 distinct code (4 in the other question) and the fact that here we try and predict a class of codes, rather that just the code itself. With regards to this aggregation of, here, 2 codes per class, we have several options:

work with classes from the start, i.e. replace each code read in the sequence by its class, and only consider and keep track of classes then on.
work with codes only, i.e. create a predictor of 1-thru-10 codes, and only consider the class at the very end, adding the probability of the two codes which comprise a class, to produce the likelihood of the next item being of that class.
some hybrid solution: consider / work with the codes but sometimes aggregate to the class.

My personal choice would be to try first with the code predictor (only aggregating at the very end), and maybe adapt from there if somehow insight gained from this initial attempt were to tell us that the logic or its performance could be simplified or improved would we aggregate earlier. Indeed the very same predictor could be used to try both approaches, one would simply need to alter the input stream, replacing all even numbers by the odd number preceding it. I'm guessing that valuable information (for the purpose of guessing upcoming codes) is lost when we aggregate early.

继续阅读：classification machine-learning predict

machine learning predict classification

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？