Generalization functions for Q-Learning

2022-12-08 06:03 问答作者：

I have to do some work with Q Learning, about a guy that has to move furniture around a house (it's basically that). If the house is small enough, I can just have a matrix that represents actions/rewards, but as the house size grows bigger that will not be enough. So I have to use some kind of generalization function for it, instead. My teacher suggests I use not just one, but several ones, so I could compare them and so. What you guys recommend?

I heard that for this situation people are using Support Vector Machines, also Neural Networks. I'm not really inside the field so I can't tell. I had in the past some experience with Neural Networks, but SVM seem a lot harder subject to grasp. Are there any other methods that I should look for? I know 开发者_StackOverflowthere must be like a zillion of them, but I need something just to start.

Thanks

Just as a refresher of terminology, in Q-learning, you are trying to learn the Q-functions, which depend on the state and action:

Q(S,A) = ????

The standard version of Q-learning as taught in most classes tells you that you for each S and A, you need to learn a separate value in a table and tells you how to perform Bellman updates in order to converge to the optimal values.

Now, lets say that instead of table you use a different function approximator. For example, lets try linear functions. Take your (S,A) pair and think of a bunch of features you can extract from them. One example of a feature is "Am I next to a wall," another is "Will the action place the object next to a wall," etc. Number these features f1(S,A), f2(S,A), ...

Now, try to learn the Q function as a linear function of those features

Q(S,A) = w1 * f1(S,A) + w2*f2(S,A) ... + wN*fN(S,A)

How should you learn the weights w? Well, since this is a homework, I'll let you think about it on your own.

However, as a hint, lets say that you have K possible states and M possible actions in each state. Lets say you define K*M features, each of which is an indicator of whether you are in a particular state and are going to take a particular action. So

Q(S,A) = w11 * (S==1 && A == 1) + w12 * (S == 1 && A == 2) + w21 * (S==2 && A==3) ...

Now, notice that for any state/action pair, only one feature will be 1 and the rest will be 0, so Q(S,A) will be equal to the corresponding w and you are essentially learning a table. So, you can think of the standard, table Q-learning as a special case of learning with these linear functions. So, think of what the normal Q-learning algorithm does, and what you should do.

Hopefully you can find a small basis of features, much fewer than K*M, that will allow you to represent your space well.

继续阅读：artificial-intelligence language-agnostic reinforcement-learning

Generalization functions for Q-Learning

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？