In the classical supervised learning model, one chooses between
possible scenarios that are external to the subject: ie rainy day vs non-rainy
day. In reinforcement learning, the agent has to make choices, going
from one state to the other. It then becomes a matter of keeping
track of outcomes for each and every state and action consequence.
The markov chain becomes a Markov Decision process.
Our goal, then, is to generate a function Q:
It seems appropriate to then implememt greedy decision-making ie always
move to maximize our outcome reward. But that might get stuck in not exploring
sufficiently to make an optimal decision. One thus needs to reintroduce an element
of randomness so that the robot/agent occasionally moves randomly.
It is illustrated but the assignment is to complete some of the NIM functions...
(Cute game: eaxh player removes as many dots as desired on a row at each move.
whover removes the last dot looses)
The computer is going to train playing 10,000 games against itself, with alpha
at 0.5 (the factor for updating the value of a move from new information), and epsilon
at .10 (how often a random move will be tried). Computer should be a formidable
opponent after this practice...
In the case of NIM, it's pretty cut and dried that thecomputer needs to keep track
of everything. In a larger game-space, one might use approximations that recognize patterns.
No comments:
Post a Comment