This week I did a combination of research on reinforcement learning, and opponent/platform setup to enable the RL model training.
With regard to research, I want to understand as much as possible about reinforcement learning before I start the process of actually building a Mancala RL model. Through preliminary research our group decided that a self-play method of training would be best, so I read a number of papers and tutorials on both the theory of self-play RL and the logistics of putting it into practice in Python. A few of the resources I used are shown below:
In order to train the self-play RL model, I must have a competent opponent for the model to start off playing against, before it can train against previous iterations of itself. If I choose too strong of a starting opponent, the model will not get enough positive reinforcement (as it will almost never win), and if I choose too weak of one the reverse is true. As such, we will start with a relatively simple minimax strategy that looks two “ply” (single player turns) into the future. However, to build this strategy, I need a platform for the game to be played on (so the RL model can play the minimax opponent). This week I started building this platform, programming all game rules and actions, and a framework where two separate players can interact on the same board. I then implemented unit tests to make sure all game actions were functioning as they should. With this now in place, I have begun programming the minimax strategy itself. This means I am on schedule, and hopefully will have the minimax available to start training within the week.