Status Report – Page 2 – Team A0: Go Learning Buddy

November 11, 2023

Nathan’s Status Report For 11.11.23

I’m happy to say that I accomplished everything I set as my goals last week and in fact am ahead of schedule at the moment. I was able to port my data over to the ECE machines and finish training (with about a 13x speedup) on there. I solidified both the weights and the architecture for the value network. I then utilized the design space exploration from the value network to solidify the architecture for the policy network. Finally, I began testing the MCTS process locally, and once I am sure it fully works, I will port it over to continue on the ECE machines as well.

Starting off with the relocation to the ECE machines, I was able to move 13 GB of training data (more than 3.5 million data points) over to my ECE AFS so I could train the value network remotely. This had the added advantage of speeding up training time by a factor of about 13, meaning I had more freedom with the network architecture. The architecture I ended up settling on took about 13 minutes per epoch on the ECE machine, meaning it would have taken ~170 minutes per epoch locally which obviously would have been impossible as even a lower bound of 50 epoch would have taken about a week.

Secondly, my finalized architecture is shown below in listed form,

As you can see, there are three parallel convolution towers, with kernel sizes of 3, 5, and 7, which help the network derive trends in different sized subsections of the board. Each tower than has a flattening layers, and a fully-connected dense layer. These layers are concatenated together with the other towers, giving us a singular data-stream that passes through successive dense and dropout layers to prevent overfitting, culminating in a singular sigmoid output node, which provides the positional evaluation. This network was trained on 3.5 million data points, pulled evenly from over 60,000 expert level go matches. After training, the network was able to identify the winner of a game from a position 94.98% of the time, with a binary cross-entropic loss of .0886. This exceeded my expectations, especially considering many of the data points come from the opening of matches, where it is considerably harder to predict the winner as not many stones have been placed.

Using my design space exploration for the value network, I was able to solidify an initial architecture for the policy network, which will have the same convolutional towers, only differing in the amount of nodes in the post-concatenation dense layers, and the output form of a length 362 vector.

I have started testing MCTS locally, with success, once I am convinced everything works as expected I will port over to the ECE machines to continue generating training data for the policy network, in addition to tuning the value network. Fortunately, as for the first iteration of MCTS the policy network essentially evaluates all moves equally, the training data will be valid for further training, even if the architecture for the policy network needs to be changed.

In the next week, I plan to move MCTS over to the ECE machines and complete at least one iteration of the (generate training data via MCTS, tune value network and train policy network, repeat) cycle.

ABET: For the overall strength of the Go engine, we plan to test it by simply having it play against different Go models of known strength, found on the internet. This will allow us to quantitatively evaluate its performance. However, the Go engine is made of 2 parts, the value and the policy networks. Training performance gives me an inkling into how these networks are working, but even with “good” results, I still test manually to make sure the models are performing as expected. Examples of this include walking through expert games to see how the evaluations change over time, and measuring against custom-designed positions (some of which were shown in the interim demo).

November 4, 2023

Nathan’s Status Report For 11.4.23

In my previous week’s report, I mentioned that my goal this week was to finish the design space exploration for the Value Neural Network, and begin running simulations. Unfortunately, I am running about one day behind schedule, as processing the expert-level games dataset into consumable board states took longer than expected. However, I have a baseline version of the value network set aside for the interim demo, and am finishing up the design exploration as we speak, meaning if a better model is trained between now and Monday I can replace the already competent baseline.

That being said, I have not fallen very far behind at all, and it is easily covered by the slack built into my schedule. However, there are a few things of note before I start simulation proper, the first being ECE Machine setup. For the preliminary value network, I trained locally as the training data I generated takes up roughly 40 GB of space, well above my AFS limit. However, locally I am also limited by 8 GB of RAM meaning I can only use about 7.5 GB of this training subset anyway. As such, even if I cannot port all 40 GB of data onto the ECE Machines, anything over 8 GB would be an improvement, and worth trying just in case it helps train a substantially different model. As such, I am planning on asking Prof. Tamal on Monday who I should ask about getting my storage limit increased, and I will work on it from there.

The design space exploration has also yielded useful results in terms of what an allowable limit on network size would be. Locally, I’m currently operating with 2 convolutional layers, 1 pooling, and 1 fully connected dense layer, and this takes about 6.5 minutes per epoch with my reduced 8GB training set. The ECE machines will compute faster, and this 6.5 minutes per epoch rate is far shorter than my limit once we’re past the interim demo. This means if necessary, both the value and policy network architectures can grow without the training time becoming too prohibitive.

Therefore, beyond our interim demo, I plan to begin simulations next week to generate my first batch of policy-network-training and value-network-tuning data. Ideally I get the space increase on AFS quickly meaning I can do this remotely, but if possible I can run it locally as well, and port over the weights later. I also plan on setting up the architecture and framework for the policy network as well, so that I can begin training it as soon as the simulation data starts being generated.

November 4, 2023

Team Status Report For 11.4.23

A hardware risk that we are currently facing is the delay in parts and our delay of ordering them. With our delay of these hardware components, it’s put us behind in building and testing our code. This would push our completion date as well as possibly affect our LED development in the later half of the semester. As of right now at least, we plan to have a pre-made circuit design layout on the vector boards to speed the build process when our components arrive as well as start soldering resistors as required. This would at least not waste too much of our time and allow us to gain back some later.

One of the risks of site development is that people may not find specific parts of it not very intuitive to use. Some issues can seem a little nitpicky (one of the 2 tabs should be highlighted so that the user can easily tell which page they are on) but having an easy-to-use site makes the product more appealing. To mitigate this issue, we’ll find some sample users and have them test out the site and record their feedback on specific issues with usage on the site.

No changes were made to the existing designs of the 3 different components of the project (hardware, software, go engine).

However, there were some changes made to the schedule:

November 4, 2023November 4, 2023

Hang’s Status Report For 11.4.23

This week, I spent some time making the UI look nicer and got the saved games to display on the site. I created a new page specifically for the saved games (all the routing done with react-router) and added the necessary button and dropdown menu for uploading a file and choosing a move number. I also spent some time making the board interactable specifically for the interim demo because we haven’t integrated the different parts yet, so after the board, go engine, and site are all integrated, I will remove this feature.

My progress is currently on track. By next week, I will set up a local python server specifically for running the python scripts for the go engine, so that the frontend can make a request for getting the recommended game moves, and the server will return a response containing the recommended moves.

October 28, 2023November 4, 2023

Team Status Report For 10.28.23

One of the risks that we are handling in the hardware development of the project is starting mid-size scale development of the circuitry without having tested a subset of the circuitry design. Some concerns regarding this is that our Arduino component, even though theoretically able to provide enough Current for our required sensor subset read time, the Current the Arduino has may not be substantial enough to provide for all the sensors at once. If this Arduino can not provide Current required, we will need to add more into our circuit than expected to limit current usage of each sensor.

For the secondary risk mentioned in last week’s team status report, we were able to find an open-source database with expert Go games; this fully mitigates this risk from last week. The games are stored in a file format called SGF, or smart game format, where game moves are stored in a tree which allows for variations from the main line of gameplay. This database has over 60000 go games, and the go games are split into different categories.

For hardware, we have changed our usage of photodiode sensors to photo resistors with an additional static 1MOhm resistor in series. This decision was required due to unexpected value results not being a high enough range. This along with the component requiring an additional resistor to restrict Supply Current made our usage of this sensor impractical and even more expensive than other components. The new photo resistors are very simplistic in characteristic, cheaper, and will theoretically be able to provide a larger range of light values with our 1MOhm resistor we have chosen in series. In addition, we have added more parts to the physical board design, as we will require an additional platform in the physical board to support the multiple mini vectors to be held near the top of the board’s holes.

Besides the hardware design change, there are no design changes to the reinforcement learning side and the software side. The development of the reinforcement learning model is ahead of schedule. The development of the software side is a little behind schedule, but the work can easily be caught up in time.

Updated schedule:

Physical board assembled and subcircuit testing breadboarded.

General vector board placement internal view

October 28, 2023

Nathan’s Status Report For 10.28.23

In my previous weekly report, I mentioned that my goal for this week was to finish my expand() function, and to find a dataset to train the initial values for my value network. I am happy to say that I have accomplished both of these, as well as finishing the rest of the code required for MCTS. The dataset I am going to use is located here and contains over 60000 games from professional Go matches played in Japan. For the curious viewers, all the code required for MCTS can be found on Github.

With the aforementioned dataset, I plan to begin training the value network as soon as possible (hopefully by Monday). While I have the dataset, it is stored in the Smart Game File (SGF) format, which is the sequence of moves, not the sequences of board states generated. As I need the board states themselves for training, I am currently working on a script to automatically process all 60000 of these files, generating each board state and tagging them with the game result. These results are the training data I require. Once this is finished, I can begin training, which involves the physical training over the dataset, but also I will do some design space exploration with regards to network architecture (number of nodes, types and number of layers, etc.). This will allow me to find a closer to ideal combination of accuracy and processing time (as efficient simulations are helpful in training, but vital for usage).

This design-space exploration will actually prove helpful for the Policy Network as well, as it will provide a baseline for the allowable complexity. (Higher number of nodes and layers will generally perform better barring overfitting, but will take more time, the value network exploration will give me an estimate for the amount of layers I can use in the policy network as the networks can be used to evaluate positions in parallel.)

Then once I have the parameters (architectural mainly) for the networks set, and the initial weights for the value network trained, I can immediately begin running simulations, as all of that infrastructure is complete. I should be running these simulations by the time of my report next week, and will aim to do my first training run on both networks with the data the simulations generate.

October 28, 2023October 28, 2023

Hang’s Status Report For 10.28.23

This week, I was working on displaying the saved game states. To do this, I needed a component that would open up a file finder, and it turns out this is a standard component in React called <input/>. After the user selects the file, I have to read in the content of the files into an array. The format of the file contents follow this: tile_type, move_number, and this repeats 361 times (19×19) with the. tile_type either being empty, black, or white, and move_number either being the move number of the tile or -1 representing empty. Since this entire content is read in as a string, I used split to convert the string into an array. I also spent some time looking into the React router so that the site can have different pages as I wanted to separate the page for live gameplay and displaying the saved games.

I’m a little bit behind schedule since I wasn’t able to display the saved game states yet. This is because I had a pretty large project due for one of my classes this week, so I spent most of my time focusing on that project. I’ll spend a little more time this following week to finish up displaying the game states.

My deliverables for next week will be displaying the saved game states, and getting multiple pages working for the site.

October 21, 2023November 4, 2023

Team Status Report For 10.21.23

First of all, to address the major risk from our last report, we have made significant progress on the physical board but we are not out of the woods yet. We have built the autocad design and prepped for laser cutting for our new 100cmx100cm wooden plank. In addition we have started circuitry assembly for testing purposes before making a final order for all our electrical components. We still have not finished these tasks and have yet to start testing our software on this circuitry but we plan to do so in the coming week. The risk here has evolved a little as well, as we realized we need fire and short-circuit protection, so have developed a plan to do this with a combination of insulation, and strategic spacing of components, but we still need to actually add that to the board.

A secondary risk from the reinforcement learning side is the inability to find an open-source database with expert go games. We felt confident that such a database existed, but should it end up not existing (or the data is unclean, or it is in an unusable state, etc.), the value network for MCTS would not be able to be trained to non-random weights before training commences. Fortunately, the mitigation for that comes in the form of MCTS itself, as the search generates more value network training data, that is, every board state plus whatever result the game it came from reached.However, this would harm the efficacy of training at first (the network would improve more slowly) and while we have enough slack for that possible time setback, it would be ideal to avoid.

Other than the aforementioned insulation and spacing, there are no changes to the design, and everything is running roughly on schedule. The development of the website is on-track with the feature for saving game history finished, and we have started on displaying the game history. Our hardware development is on track as well and the reinforcement learning model is almost ready for training.

October 21, 2023

Hang’s Status Report For 10.21.23

This week, I focused on saving game states. Originally, I wanted to save the board for each move with the recommended moves from the Go engine, however, I’ve decided against this since the text file would be unnecessarily large. Instead, I’ll only save “one board” flattened into an array of 361 (19×19). Each element in the array will either be a tuple of (“E”, -1) or (“W”, some move number) or (“B”, some move number), where “E” corresponds to empty, “W” to white, and “B” to black. I won’t be saving the recommended game moves, instead, I would be reconstructing the board for each move selected to display, and then feeding that board state to the Go engine to get the recommended moves. While this will take a little more processing time for displaying the history of games, it’s worth the memory cost of storing large games/games with many moves.

My schedule is on track as I’ve finished saving game history. By next week, I should have the feature for displaying game history done.

While I have some frontend/UI experience as I have done a full stack internship before, most of my work was in backend. For this project, almost all of my work will be with React/frontend, and I’ll be creating new components with CSS which I haven’t done before. I’ll also be looking into how to do serial communication between the web page I’m creating and the Arduino board for getting the game states with Web Serial API.

October 21, 2023

Nathan’s Status Report for 10.21.23

As I mentioned in my previous status report, my goal for this week was to finish custom loss function implementation, then work towards getting the MCTS training data generation working, given that the framework was already there.

With regard to the former of the two goals, that work is all complete. I had actually misunderstood my needs, I don’t actually need a custom loss function, as the two networks are trained on mean-squared error (MSE) and binary cross-entropic loss (BCEL). Nevertheless, I have built the framework for the training of the two networks (Policy & Value), in addition to the data generation section of the MCTS code (where the board states and MCTS visit counts are stored as npz files.

With regard to the latter, there was a bit more involved than I originally anticipated. While I have all the Go gameplay functionality for training built, I am specifically not finished for the code for the MCTS expansion phase, where the leaf to expand is determined and children are generated. That being said, I have built out all other functionality, and am working to finish the expand() function by Monday in order to stay ahead of schedule. A brief schematic of the training structure is shown below.

Once the expand() function is finished, the next step is finding a dataset of expert go matches to use as training data for the value network pre-simulation. While this is not strictly necessary, and self-play could be used for this, giving the value network a strong foundation with expert-generated data improves the quality and function of the initial training data much faster. My goal is to have expand() finished and this dataset found by the end of the week. If that goes according to plan, I would be able to commence network training and MCTS generation immediately afterwards.

To accomplish my task, I had and will have to learn a few new tools. While not really a tool, I had to fully understand Monte Carlo Tree Search in order to program it correctly. More presciently, I have never used PyTorch before, or worked with Convolutional Neural Networks. While I have worked with similar technologies (fully connected deep neural networks and TensorFlow) I have had to do a lot of research into how to most effectively utilize them.