reinforcement-learning_开发者

开发者

reinforcement-learning

相关标签：javascript jquery android 多少钱 iPhone

What machine learning algorithm should I use for Connect 4?
I have an AI that is good at playing Connect 4 (using minimax). Now I want to use some machine learning algorithm to learn from this AI that I have, and I would like to do that by just letting them pl
问答阅读(5)
XOR Hebbian test/example neural network
I just finished writing some code that runs a hebbian learning feedforward neural network. I\'ve done a backpropaga开发者_运维问答tion neural network before and the first thing I did to make sure it w
问答阅读(6)
Are neural networks really abandonware?
I am planning to use neural networks for approximating a value function in a reinforcement learning algorithm. I want to do that to introduce some generalization and flexibility on how I represent sta
问答阅读(6)
How to Learn the Reward Function in a Markov Decision Process
What\'s the appropriate way to update your R(s) function during Q-learning? For example, say an agent visits state s1 five times, and receives rewards [0,0,1,1,0]. Shou开发者_StackOverflowld I calcula
问答阅读(7)
C++ Reinforcement learning and smart pointers
I am doing my Masters project on robotic\'s sensorimotor online learning using reinforcement learning methods (Q,sarsa,TD(λ),Actor-Critic,R,etc). I am currently designing the framework on which both
问答阅读(6)
How to train an artificial neural network to play Diablo 2 using visual input?
I\'m currently trying to get an ANN to play a video game andand I was hoping to get some help from the wonderful community here.
问答阅读(5)
SARSA algorithm
I am having trouble understanding the SARSA algorithm: http://en.wikipedia.org/wiki/SARSA In particular, when updating the Q value what is gamma? a开发者_StackOverflow中文版nd what values are used fo
问答阅读(5)
Reducing the number of markov-states in reinforcement learning
I\'ve started toying with reinforcement learning (using the Sutton book). I fail to fully understand is the paradox between having to reduce the markov state space while on the other hand not making a
问答阅读(6)
TD(λ) in Delphi/Pascal (Temporal Difference Learning)
I have an artificial neural network which plays Tic-Tac-Toe - but it is not complete yet. What I have yet:
问答阅读(4)
Implementing HexQ Algorithm [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
问答阅读(5)

首页上一页第1页下一页共2页