1.2.2 Choosing the target function
To determine the exact what type of knowledge will be learned and how this will be used by the performance program.
Let’s begin with the legal moves a bot can take. Legal moves are the moves our bot( the model ) can take which are correct. Now the bot needs to learn to choose the best moves among these legal moves in situations.
Let’s call this function ChooseMove, which chooses the best moves for the bot.
ChooseMove : M→B
which takes input, set of legal moves M and outputs the best moves B
To make ChooseMove performance P better with experience E, we set a numerical score as TargetFunction(V).
TargetFunction (V): B → R
V maps any best move to some real value R, and intend for this target V to assign higher scores to better board states.
- if b is the final state, won, V(b) = 100
- if b is the final state, lost, V(b) = -100
- if b is the final state, draw, V(b) = 0
- if b is the final state, V(b) = V(b’)
where b’ is still the best state that can still be achieved.