nathanij – Team A0: Go Learning Buddy

December 10, 2023

Team Status Report For 12.9.23

As I’m sure is common among many groups, this week our team was basically exclusively focused on getting our project ready for demo. On that front:

Engine: The engine is fully operational with high accuracy value and policy networks. In the demo version, a simulation depth of 25 is used, meaning the engine is looking up to 25 moves into the future (though the average would tend to be between 8-10, with a minimum of 4). The engine has been integrated fully into the web app, and both real-time and historical analysis work as intended.

Web-App: The web-app is fully operation on its own. Integration with the engine has been completed, as mentioned earlier, however integration with the physical board is not quite complete. That being said, the analysis function works as intended, and communication with the physical board is almost complete, pending a full debug of said board.

Physical Board: The physical board is almost fully complete. The LEDs have been soldered, meaning the only remaining issue is debugging 2 of the muxes that are having slight issues. This should be done easily before demo, as it is a known issue that has been fixed on 2 of the other muxes already. Full integration with the web-app has been done, but the overall product has not been fully tested, as it is not possible without working muxes. That is, with the outputs coming from the muxes currently, the web-app and engine work exactly as expected, so there is a 99.9% chance no more debugging will be needed once the final two muxes are done.

Once the muxes are debugged we are ready for demo, and the next week will be spent on the video and on the final report.

ABET:

Unit Tests:

Engine:
Basic board positions to insure all suggestions are legal moves. Found error with how passes were signified on a 9×9 board, pass was changed to be signified by an index of 81 and not 361.
Complex board positions to insure engine maximized or minimized depending on whose turn it is (minimize for black, max for white). Found error where exploration wasn’t inverted, so for black’s moves the optimization wasn’t working correctly. Fixed.

Physical Board:
Sensors tested manually when disconnected from board (i.e. only connected to Arduino) to make sure they would fit our purposes. These ended up having thresholding issues, which caused our change from 1 megaOhm to 10 megaOhm resistors in series with the photoresistors.
Tested multitude of board positions to ensure the correct board state is sent to arduino. Found issue with muxes, which is currently being debugged.
(Each intersection was tested individually)
Tested each individual button to make sure each signalled the web-app in the expected fashion.

Web-app:
Tested input states to make sure they were rendered as intended.
Tested responses to RPi signals (from Arduino) to ensure intended behavior.
Pytest suite for endpoints and web-socket.

Integration Tests:
Web-App -> Engine:
Tested interactions to make sure a prompt suggestion is delivered while also making sure all suggested moves are legal (implying the sent state is processed correctly). This identified a secondary issue with how passes were being conveyed which was again fixed.
Timing analysis was performed to be sure that the engine and communication components complied with out use case requirements.

Board->Web-App:
Extensive unit tests comprising of setting up the board and making sure the exact state on the board is the exact state on the web app. This caught issues with the muxes, which are currently being debugged, so I can’t give a super detailed writeup on the solution yet.

December 9, 2023December 9, 2023

Nathan’s Status Report For 12.9.23

I am happy to say that not only is the engine fully ready for demo multiple days ahead of time, but also that the improvements from the MCTS training run in the past week were much larger than I originally anticipated.

Before switching to 9×9 board, the original (initial trained) version of the policy network had just above 1% accuracy. While that sounds extremely low, it in fact meant the policy network identified the “best” move in the position 4 time more often than randomly guessing (with 362 options), plus, there is a much higher chance that the “best” move is in its top 5 or 10 suggestions.

The second version, trained exclusively on 9×9 data generated from my MCTS runs, had an accuracy of around 9%, which represents performance about 8 times better than randomly guessing, as for 9×9 there are only 82 possible selections, not 362. With each increase in accuracy, the expansion factor of the tree search (i.e. how many candidate moves are considered from each node) can be reduced, as the probability that the best moves are some of the stronger recommendations of the policy network is higher.

Finally, with the third version, I was expecting something around 20% accuracy, however, much to my surprise the measured accuracy (on separated validation data) was just over 84%. Though this is probably not due to overtraining (due to the network architecture having multiple systems to prevent this, and considering the performance on “unseen” validation data), I suspect some of this marked increase is due to the policy network essentially predicting the value network’s response, with an extra layer of abstraction. That is, if the policy network identifies what traits in a position the value network “likes”, its suggestions will line up more evenly with the value network’s evaluations. However, this is indeed the point of MCTS, as assuming a degree of accuracy in the value network (which I think is safe due to the abundance of training data and past performance), the policy network’s job is to identify candidate moves that may or may not result in the best possible positions. Thus fewer candidate moves need to be explored, and deeper, more accurate simulations can be run in the same amount of time.

This 84% model is the policy network that will be utilized by our engine during demo, however, I am starting a simulation run tonight, whose results (if better), will be used in our final report. The engine is fully integrated onto the web app backend, and works exactly as intended.

Added note, the code for the engine itself (i.e. lacking the code for simulation which can be found on the Github I linked earlier in the semester) can be found here.

December 3, 2023

Team Status Report For 12.2.23

A hardware risk that ended up occurring was completing the 19×19 board in time for the demo. As such, we decided to change our design by downscaling our board from a 19×19 grid into a 9×9, making the board around 4x smaller. This change also affected the software component and the engine component for the project. On the software side, the data structures holding the board states had to be changed to accommodate for the smaller board size on both the client side code and the code on the raspberry pi server and the backend server.

This created some risk for the Engine, as all the planning had been for a 19×19 version, including the training for the value network and initial training for the policy network. It was decided after deliberation that we would retain the 19×19 network architectures, but adapt their usage to fit a 9×9 board. This involved tweaking the engine so it converts 9×9 inputs into 19×19 via padding, allowing the networks to run as normal. Additionally it required a refactoring of the MCTS code to run 9×9 simulations. These are necessary, as switching the network to optimizing for 9×9 hurts its relative strength, so extra data points are needed to correct this.

As a quick note, the server and backend needed to be adjusted to fit the 9×9 board, but this was done very quickly and easily.

Our new schedule is shown below, and most of the changes are around adapting our system to the new 9×9 board:

December 3, 2023

Nathan’s Status Report For 12.2.23

Some unforeseen issues caused me to deviate a bit from my expectations for the week (i.e. what I said I would do in my previous status report). After extensive consultation it was decided that we would change the board from its original size of 19×19 to a small size of 9×9 to aid in construction time reductions. As such there were a few extra tasks for the week, that I will go into more detail on later.

Task 1: Multi-machined MCTS. In the past week(s) I have been running MCTS near-continuously across many of the ECE number and lab machines. This allowed me to collect around 150,000 data points for tuning, bringing me to

Task 2: Policy Network Initialization. With the data generated through MCTS I was able to train the initial version of the policy network. This is more useful than just being an accuracy increase, as an increase in policy network strength means the expansion factor of MCTS can be reduced to maintain the same level of strength over the same simulation depth. That is, because the suggested moves are better on average, fewer possibilities need to be explored, and thus execution time decreases.

Task 3: Converting the engine to 9×9 usage. The change to a 9×9 board required a change in the engine as all the networks were set up to take in a 19×19 vector instead of a 9×9. This required a small amount of refactoring and debugging to make sure everything was working as intended.

Task 4: Converting MCTS to run as 9×9. As previously mentioned the engine has been converted from 19×19 to 9×9 to conform to the physical board changes. Unfortunately, this reduces the engine’s relative strength, as the value network and first iteration of the policy network were trained on 19×19 data. Accordingly, I refactored the MCTS code to run MCTS simulating 9×9 games, which will generate more specialized data to tune both networks.

Accordingly, aside from prepping for final presentations and demo, I will just be running MCTS in parallel across machines to generate extra tuning data for both networks. The engine works as is, so this improvement is all I have to work on until demo.

November 19, 2023November 19, 2023

Team Status Report For 11.18.23

Hardware:
There have been quite a few design changes made with regards to the hardware component. The main differences from the previous design iteration are that wires will replace the vector boards, and the instead of being located on said vector boards, the light sensors will be located on the wooden planks themselves. With regards to its development, we are still wiring sensors which is taking longer than expected due to design changes. This is also affecting the planned interfacing with the Raspberry Pi. This is the main risk for our project at the moment, as to mitigate it, Hang has started to work on the hardware component along with Israel, in order to reach the interfacing stage faster.

Software (backend):
We have a slight change in design for the software backend. After consultation with Prof. Tamal, we have decided to implement the communication between the Arduino connected to the physical board and the web server via a Raspberry Pi, instead of using the user’s computer. This will be done by hosting a Flask server on the Raspberry Pi itself. However, before this can be implemented Hang will continue his work helping Israel with the hardware component, as that has proved to be more complex than expected and thus requires more manpower in order to be finished in time.

Software (Engine):
The engine has come together nicely, and is in full working order, and is currently running MCTS in order to tune both the value and the policy networks. However, as mentioned in Nathan’s weekly report there is a bit of a risk in the execution speed of the MCTS. This does not threaten the existence of the engine, the engine already works as is, and can be used. However, the faster and more efficient the MCTS implementation is, the more scenarios can be examined in a given amount of time, and thus the more accurate the engine can play (assuming the evaluation skills of the network are not harmed in the speedup). Thus Nathan will continue working on improving execution to allow for deeper simulations. At some point, he may also need to evaluate the tradeoffs in reducing the network complexity to improve MCTS timing, as the increase in allowable depth could outweigh the network accuracy loss.

Schedule:
Other than Hang, whose changes are detailed above (helping with the hardware) there are no schedule changes. Because Hang is ahead of schedule this added workload will not cut into his slack, and this added help will allow Israel to stay on schedule.

ABET:
We’ve definitely gotten a lot better at working together as a team as the semester has proceeded. At first, we did all of our work together, in person. This had quite a few limitations, including timing and interest levels. We began expanding the ways in which we work together. Sometimes when working on collaborative efforts (ex. team reports, design documents, etc.) we work remotely but synchronously connected over a voice call. This allows us to communicate effectively while still being in the comfort of wherever we choose to be (usually at home). We’ve also gotten better at working asynchronously. Our project sections (hardware, software, and engine) are fairly disjoint, so we spend a lot of time working individually, however, there are obviously times when we need to consult each other on how parts interact. We have gotten good at doing as much as possible while leaving a generalized interface, prevent blockages while still making interaction efficient and easy. Finally, due to the recent overlap in jobs (Hang helping on hardware) all of us have had to learn a bit more about how the hardware works, and Israel has done a great job teaching us so we can help him out.

November 18, 2023November 18, 2023

Nathan’s Status Report For 11.18.23

I am once again happy to report that I was able to accomplish all of my stated goals from last week, and am again running on schedule.

I started by debugging MCTS locally, as I still had some errors in the code, including an aliased scoreboard among tree nodes causing rampantly inflated scores and a missed case when checking which stones on a board would be captured by a particular move. Once I fixed these issues, among others, I was able to simulate multiple full matches of the MCTS engine playing against itself locally to cover any edge cases that my test cases didn’t catch (stepping through each move manually). Once this was finished, I ported it over to be run on the ECE machines.

Once moved over to the ECE machines, I set up the initial run of MCTS, which is running as I write this. As I am able to complete more runs, the policy network strength will increase, and thus the expansion factor for each node in the tree can be lowered, reducing the computation required for each simulation (ex. I might only need to consider the top 20 suggested moves at any given position from a stronger policy network rather than say 50 from a weaker or recently-initialized network).

That being said, I am not fully satisfied with the current time each MCTS iteration is taking, and am thus currently working on optimizing my implementation while simulations are running. I was expecting about a 13x speedup from my local machine to the ECE machines, which is what I saw when training the value network, but for some reason, with MCTS this speedup is almost non-existent, limiting the rate at which I can generate new data. As such, I am doing some research into what might be causing this (GPU utilization, etc.). Secondarily, I am also optimizing my direct MCTS implementation. An example of the types of changes I’m making includes only expanding a node (i.e. generating children for each of the top n nodes) once it has been selected post its own generation, that is, the search not only found it as a temporary ending node, but also selected the node again for expansion. This cuts down of the amount of calls to the value network to evaluate positions, which seems to be the largest factor in slowing the program down.

Finally, I have settled on a definite policy network architecture, with it being the same as the value network, but having a length 362 softmax vector as the final dense layer, instead of a singular sigmoid scalar.

Over the next week (Thanksgiving week) I mean to continue running MCTS simulations, training the policy network, and optimizing the system to increase the speed at which I generate training data.

Final note: As I have MCTS working fully, this essentially means the the engine can be run to play against a real opponent (not itself) at any time, as everything is synced up together. The engine will improve with each iteration of MCTS, but this just updates the weights of the constituent networks.

November 11, 2023

Nathan’s Status Report For 11.11.23

I’m happy to say that I accomplished everything I set as my goals last week and in fact am ahead of schedule at the moment. I was able to port my data over to the ECE machines and finish training (with about a 13x speedup) on there. I solidified both the weights and the architecture for the value network. I then utilized the design space exploration from the value network to solidify the architecture for the policy network. Finally, I began testing the MCTS process locally, and once I am sure it fully works, I will port it over to continue on the ECE machines as well.

Starting off with the relocation to the ECE machines, I was able to move 13 GB of training data (more than 3.5 million data points) over to my ECE AFS so I could train the value network remotely. This had the added advantage of speeding up training time by a factor of about 13, meaning I had more freedom with the network architecture. The architecture I ended up settling on took about 13 minutes per epoch on the ECE machine, meaning it would have taken ~170 minutes per epoch locally which obviously would have been impossible as even a lower bound of 50 epoch would have taken about a week.

Secondly, my finalized architecture is shown below in listed form,

As you can see, there are three parallel convolution towers, with kernel sizes of 3, 5, and 7, which help the network derive trends in different sized subsections of the board. Each tower than has a flattening layers, and a fully-connected dense layer. These layers are concatenated together with the other towers, giving us a singular data-stream that passes through successive dense and dropout layers to prevent overfitting, culminating in a singular sigmoid output node, which provides the positional evaluation. This network was trained on 3.5 million data points, pulled evenly from over 60,000 expert level go matches. After training, the network was able to identify the winner of a game from a position 94.98% of the time, with a binary cross-entropic loss of .0886. This exceeded my expectations, especially considering many of the data points come from the opening of matches, where it is considerably harder to predict the winner as not many stones have been placed.

Using my design space exploration for the value network, I was able to solidify an initial architecture for the policy network, which will have the same convolutional towers, only differing in the amount of nodes in the post-concatenation dense layers, and the output form of a length 362 vector.

I have started testing MCTS locally, with success, once I am convinced everything works as expected I will port over to the ECE machines to continue generating training data for the policy network, in addition to tuning the value network. Fortunately, as for the first iteration of MCTS the policy network essentially evaluates all moves equally, the training data will be valid for further training, even if the architecture for the policy network needs to be changed.

In the next week, I plan to move MCTS over to the ECE machines and complete at least one iteration of the (generate training data via MCTS, tune value network and train policy network, repeat) cycle.

ABET: For the overall strength of the Go engine, we plan to test it by simply having it play against different Go models of known strength, found on the internet. This will allow us to quantitatively evaluate its performance. However, the Go engine is made of 2 parts, the value and the policy networks. Training performance gives me an inkling into how these networks are working, but even with “good” results, I still test manually to make sure the models are performing as expected. Examples of this include walking through expert games to see how the evaluations change over time, and measuring against custom-designed positions (some of which were shown in the interim demo).

November 4, 2023

Nathan’s Status Report For 11.4.23

In my previous week’s report, I mentioned that my goal this week was to finish the design space exploration for the Value Neural Network, and begin running simulations. Unfortunately, I am running about one day behind schedule, as processing the expert-level games dataset into consumable board states took longer than expected. However, I have a baseline version of the value network set aside for the interim demo, and am finishing up the design exploration as we speak, meaning if a better model is trained between now and Monday I can replace the already competent baseline.

That being said, I have not fallen very far behind at all, and it is easily covered by the slack built into my schedule. However, there are a few things of note before I start simulation proper, the first being ECE Machine setup. For the preliminary value network, I trained locally as the training data I generated takes up roughly 40 GB of space, well above my AFS limit. However, locally I am also limited by 8 GB of RAM meaning I can only use about 7.5 GB of this training subset anyway. As such, even if I cannot port all 40 GB of data onto the ECE Machines, anything over 8 GB would be an improvement, and worth trying just in case it helps train a substantially different model. As such, I am planning on asking Prof. Tamal on Monday who I should ask about getting my storage limit increased, and I will work on it from there.

The design space exploration has also yielded useful results in terms of what an allowable limit on network size would be. Locally, I’m currently operating with 2 convolutional layers, 1 pooling, and 1 fully connected dense layer, and this takes about 6.5 minutes per epoch with my reduced 8GB training set. The ECE machines will compute faster, and this 6.5 minutes per epoch rate is far shorter than my limit once we’re past the interim demo. This means if necessary, both the value and policy network architectures can grow without the training time becoming too prohibitive.

Therefore, beyond our interim demo, I plan to begin simulations next week to generate my first batch of policy-network-training and value-network-tuning data. Ideally I get the space increase on AFS quickly meaning I can do this remotely, but if possible I can run it locally as well, and port over the weights later. I also plan on setting up the architecture and framework for the policy network as well, so that I can begin training it as soon as the simulation data starts being generated.

October 28, 2023

Nathan’s Status Report For 10.28.23

In my previous weekly report, I mentioned that my goal for this week was to finish my expand() function, and to find a dataset to train the initial values for my value network. I am happy to say that I have accomplished both of these, as well as finishing the rest of the code required for MCTS. The dataset I am going to use is located here and contains over 60000 games from professional Go matches played in Japan. For the curious viewers, all the code required for MCTS can be found on Github.

With the aforementioned dataset, I plan to begin training the value network as soon as possible (hopefully by Monday). While I have the dataset, it is stored in the Smart Game File (SGF) format, which is the sequence of moves, not the sequences of board states generated. As I need the board states themselves for training, I am currently working on a script to automatically process all 60000 of these files, generating each board state and tagging them with the game result. These results are the training data I require. Once this is finished, I can begin training, which involves the physical training over the dataset, but also I will do some design space exploration with regards to network architecture (number of nodes, types and number of layers, etc.). This will allow me to find a closer to ideal combination of accuracy and processing time (as efficient simulations are helpful in training, but vital for usage).

This design-space exploration will actually prove helpful for the Policy Network as well, as it will provide a baseline for the allowable complexity. (Higher number of nodes and layers will generally perform better barring overfitting, but will take more time, the value network exploration will give me an estimate for the amount of layers I can use in the policy network as the networks can be used to evaluate positions in parallel.)

Then once I have the parameters (architectural mainly) for the networks set, and the initial weights for the value network trained, I can immediately begin running simulations, as all of that infrastructure is complete. I should be running these simulations by the time of my report next week, and will aim to do my first training run on both networks with the data the simulations generate.

October 21, 2023November 4, 2023

Team Status Report For 10.21.23

First of all, to address the major risk from our last report, we have made significant progress on the physical board but we are not out of the woods yet. We have built the autocad design and prepped for laser cutting for our new 100cmx100cm wooden plank. In addition we have started circuitry assembly for testing purposes before making a final order for all our electrical components. We still have not finished these tasks and have yet to start testing our software on this circuitry but we plan to do so in the coming week. The risk here has evolved a little as well, as we realized we need fire and short-circuit protection, so have developed a plan to do this with a combination of insulation, and strategic spacing of components, but we still need to actually add that to the board.

A secondary risk from the reinforcement learning side is the inability to find an open-source database with expert go games. We felt confident that such a database existed, but should it end up not existing (or the data is unclean, or it is in an unusable state, etc.), the value network for MCTS would not be able to be trained to non-random weights before training commences. Fortunately, the mitigation for that comes in the form of MCTS itself, as the search generates more value network training data, that is, every board state plus whatever result the game it came from reached.However, this would harm the efficacy of training at first (the network would improve more slowly) and while we have enough slack for that possible time setback, it would be ideal to avoid.

Other than the aforementioned insulation and spacing, there are no changes to the design, and everything is running roughly on schedule. The development of the website is on-track with the feature for saving game history finished, and we have started on displaying the game history. Our hardware development is on track as well and the reinforcement learning model is almost ready for training.