Generalization functions for Q-Learning
I have to do some work with Q Learning, about a guy that has to move furniture around a house (it's basically that). If the house is small enough, I can just have a matrix that represents actions/rewards, but as the house size grows bigger that will not be enough. So I have to use some kind of generalization function for it, instead. My teacher suggests I use not just one, but several ones, so I could compare them and so. What you guys recommend?
I heard that for this situation people are using Support Vector Machines, also Neural Networks. I'm not really inside the field so I can't tell. I had in the past some experience with Neural Networks, but SVM seem a lot harder subject to grasp. Are there any other methods that I should look for? I know 开发者_StackOverflowthere must be like a zillion of them, but I need something just to start.
Thanks
Just as a refresher of terminology, in Q-learning, you are trying to learn the Q-functions, which depend on the state and action:
Q(S,A) = ????
The standard version of Q-learning as taught in most classes tells you that you for each S and A, you need to learn a separate value in a table and tells you how to perform Bellman updates in order to converge to the optimal values.
Now, lets say that instead of table you use a different function approximator. For example, lets try linear functions. Take your (S,A) pair and think of a bunch of features you can extract from them. One example of a feature is "Am I next to a wall," another is "Will the action place the object next to a wall," etc. Number these features f1(S,A), f2(S,A), ...
Now, try to learn the Q function as a linear function of those features
Q(S,A) = w1 * f1(S,A) + w2*f2(S,A) ... + wN*fN(S,A)
How should you learn the weights w? Well, since this is a homework, I'll let you think about it on your own.
However, as a hint, lets say that you have K possible states and M possible actions in each state. Lets say you define K*M features, each of which is an indicator of whether you are in a particular state and are going to take a particular action. So
Q(S,A) = w11 * (S==1 && A == 1) + w12 * (S == 1 && A == 2) + w21 * (S==2 && A==3) ...
Now, notice that for any state/action pair, only one feature will be 1 and the rest will be 0, so Q(S,A) will be equal to the corresponding w and you are essentially learning a table. So, you can think of the standard, table Q-learning as a special case of learning with these linear functions. So, think of what the normal Q-learning algorithm does, and what you should do.
Hopefully you can find a small basis of features, much fewer than K*M, that will allow you to represent your space well.
精彩评论