In my previous weekly report, I mentioned that my goal for this week was to finish my expand() function, and to find a dataset to train the initial values for my value network. I am happy to say that I have accomplished both of these, as well as finishing the rest of the code required for MCTS. The dataset I am going to use is located here and contains over 60000 games from professional Go matches played in Japan. For the curious viewers, all the code required for MCTS can be found on Github.
With the aforementioned dataset, I plan to begin training the value network as soon as possible (hopefully by Monday). While I have the dataset, it is stored in the Smart Game File (SGF) format, which is the sequence of moves, not the sequences of board states generated. As I need the board states themselves for training, I am currently working on a script to automatically process all 60000 of these files, generating each board state and tagging them with the game result. These results are the training data I require. Once this is finished, I can begin training, which involves the physical training over the dataset, but also I will do some design space exploration with regards to network architecture (number of nodes, types and number of layers, etc.). This will allow me to find a closer to ideal combination of accuracy and processing time (as efficient simulations are helpful in training, but vital for usage).
This design-space exploration will actually prove helpful for the Policy Network as well, as it will provide a baseline for the allowable complexity. (Higher number of nodes and layers will generally perform better barring overfitting, but will take more time, the value network exploration will give me an estimate for the amount of layers I can use in the policy network as the networks can be used to evaluate positions in parallel.)
Then once I have the parameters (architectural mainly) for the networks set, and the initial weights for the value network trained, I can immediately begin running simulations, as all of that infrastructure is complete. I should be running these simulations by the time of my report next week, and will aim to do my first training run on both networks with the data the simulations generate.