Nathan’s Status Report For 11.18.23

I am once again happy to report that I was able to accomplish all of my stated goals from last week, and am again running on schedule.

I started by debugging MCTS locally, as I still had some errors in the code, including an aliased scoreboard among tree nodes causing rampantly inflated scores and a missed case when checking which stones on a board would be captured by a particular move. Once I fixed these issues, among others, I was able to simulate multiple full matches of the MCTS engine playing against itself locally to cover any edge cases that my test cases didn’t catch (stepping through each move manually). Once this was finished, I ported it over to be run on the ECE machines.

Once moved over to the ECE machines, I set up the initial run of MCTS, which is running as I write this. As I am able to complete more runs, the policy network strength will increase, and thus the expansion factor for each node in the tree can be lowered, reducing the computation required for each simulation (ex. I might only need to consider the top 20 suggested moves at any given position from a stronger policy network rather than say 50 from a weaker or recently-initialized network).

That being said, I am not fully satisfied with the current time each MCTS iteration is taking, and am thus currently working on optimizing my implementation while simulations are running. I was expecting about a 13x speedup from my local machine to the ECE machines, which is what I saw when training the value network, but for some reason, with MCTS this speedup is almost non-existent, limiting the rate at which I can generate new data. As such, I am doing some research into what might be causing this (GPU utilization, etc.). Secondarily, I am also optimizing my direct MCTS implementation. An example of the types of changes I’m making includes only expanding a node (i.e. generating children for each of the top n nodes) once it has been selected post its own generation, that is, the search not only found it as a temporary ending node, but also selected the node again for expansion. This cuts down of the amount of calls to the value network to evaluate positions, which seems to be the largest factor in slowing the program down.

Finally, I have settled on a definite policy network architecture, with it being the same as the value network, but having a length 362 softmax vector as the final dense layer, instead of a singular sigmoid scalar.

Over the next week (Thanksgiving week) I mean to continue running MCTS simulations, training the policy network, and optimizing the system to increase the speed at which I generate training data.

Final note: As I have MCTS working fully, this essentially means the the engine can be run to play against a real opponent (not itself) at any time, as everything is synced up together. The engine will improve with each iteration of MCTS, but this just updates the weights of the constituent networks.

Hang’s Status Report For 11.18.23

This week I spent some time setting up a websocket connection for the communication between the Arduino on the physical board and our web application. Since the physical board isn’t fully set up yet, the only thing I can test is the websocket communication. 

In the example above, the client-side is emitting {test: “1”} to the server, and the server simply prints out the message, and emits it back to all clients connected to the socket. The client side then prints out the message it got from the server. Once the board is finished, I can use python’s serial library to read in data from the Arduino and also send data back to the Arduino, and with the server’s websocket, the information can be transferred between the server and the web application. 

I also spent some time trying to figure out how to connect to the locally hosted web application and backend server from a different computer in the same network. It wasn’t difficult to do this since this was already taken care of by React’s npm command. For the flask server, I simply had to add another flag into the run command. In both cases to connect, it would need my computer’s IP address. The reason I spent some time looking into this is because it was suggested that we make the communication between the web application and the physical board wireless. As such, I’m planning on running a flask server on a Raspberry Pi connected directly to the Arduino. This flask server would have the websocket as shown above for wireless bidirectional communication between the board and the web application.

While my progress is technically on track since I’m working on the communication steps now, I had a pretty heavy fever for half of the week, so I didn’t get to work on the project as much as I planned to. The Raspberry Pi component currently isn’t a priority, as the building of the physical board is. I will need to help Israel with the construction of the board, however, by next week, I can still test out running a flask server on a Raspberry Pi.

Team Status Report For 11.11.23

A hardware risk we have is being able to solder all the vector boards in a timely fashion. Even with the components we have, we are still waiting on more sensors and resistors to be received. Due to this long wait, we will have Hang develop vector board soldering to speed the process once the parts arrive.

 

A software risk is the integration between the hardware and software. Currently the board hasn’t been finished, so there’s no way of testing the communication between the board and the web application. To mitigate this risk, Hang will spend some time helping Israel with the construction of the physical board.

 

No changes were made to the existing designs of the hardware, software, or go engine.

Here is our current updated schedule:

11.11.23 schedule

Hang’s Status Report For 11.11.23

This week I started working on the backend of the project so that the frontend can communicate with the engine (otherwise we would have to translate the engine code into javascript or set up some kind of hack to run a python script from the React web application). Since the entire project is locally hosted, I chose a simple backend framework with Flask. I ran into quite a few issues with CORs when attempting to make the request from the frontend to the backend, and spent some time adding different headers into my response in an attempt to make it work. After some experimentation, I got the request to work but only when I run the web application from Chrome instead of Edge which is my default browser.

For now, since the engine isn’t fully completed, I’m returning 5 placeholder recommended moves from the backend to the frontend (results displayed above).

My progress is currently on track. By next week, I will have some code for the communication between the Arduino and my web application, but since I can’t really test this out without the board being finished, I will have to help out Israel with the creation of the board to speed up its build process.

As for testing, I will write some pytests to unit test the backend service. For the frontend, I will time the response time for the communication between the frontend and the backend by simply using timestamps (use case requirement is that the response should take less than 2 seconds, engine needs some time to compute moves) and will also include jest tests to make sure that each page renders correctly.

Nathan’s Status Report For 11.11.23

I’m happy to say that I accomplished everything I set as my goals last week and in fact am ahead of schedule at the moment. I was able to port my data over to the ECE machines and finish training (with about a 13x speedup) on there. I solidified both the weights and the architecture for the value network. I then utilized the design space exploration from the value network to solidify the architecture for the policy network. Finally, I began testing the MCTS process locally, and once I am sure it fully works, I will port it over to continue on the ECE machines as well.

Starting off with the relocation to the ECE machines, I was able to move 13 GB of training data (more than 3.5 million data points) over to my ECE AFS so I could train the value network remotely. This had the added advantage of speeding up training time by a factor of about 13, meaning I had more freedom with the network architecture. The architecture I ended up settling on took about 13 minutes per epoch on the ECE machine, meaning it would have taken ~170 minutes per epoch locally which obviously would have been impossible as even a lower bound of 50 epoch would have taken about a week.

Secondly, my finalized architecture is shown below in listed form,

As you can see, there are three parallel convolution towers, with kernel sizes of 3, 5, and 7, which help the network derive trends in different sized subsections of the board. Each tower than has a flattening layers, and a fully-connected dense layer. These layers are concatenated together with the other towers, giving us a singular data-stream that passes through successive dense and dropout layers to prevent overfitting, culminating in a singular sigmoid output node, which provides the positional evaluation. This network was trained on 3.5 million data points, pulled evenly from over 60,000 expert level go matches. After training, the network was able to identify the winner of a game from a position 94.98% of the time, with a binary cross-entropic loss of .0886. This exceeded my expectations, especially considering many of the data points come from the opening of matches, where it is considerably harder to predict the winner as not many stones have been placed.

Using my design space exploration for the value network, I was able to solidify an initial architecture for the policy network, which will have the same convolutional towers, only differing in the amount of nodes in the post-concatenation dense layers, and the output form of a length 362 vector.

I have started testing MCTS locally, with success, once I am convinced everything works as expected I will port over to the ECE machines to continue generating training data for the policy network, in addition to tuning the value network. Fortunately, as for the first iteration of MCTS the policy network essentially evaluates all moves equally, the training data will be valid for further training, even if the architecture for the policy network needs to be changed.

In the next week, I plan to move MCTS over to the ECE machines and complete at least one iteration of the (generate training data via MCTS, tune value network and train policy network, repeat) cycle.

ABET: For the overall strength of the Go engine, we plan to test it by simply having it play against different Go models of known strength, found on the internet. This will allow us to quantitatively evaluate its performance. However, the Go engine is made of 2 parts, the value and the policy networks. Training performance gives me an inkling into how these networks are working, but even with “good” results, I still test manually to make sure the models are performing as expected. Examples of this include walking through expert games to see how the evaluations change over time, and measuring against custom-designed positions (some of which were shown in the interim demo).

Nathan’s Status Report For 11.4.23

In my previous week’s report, I mentioned that my goal this week was to finish the design space exploration for the Value Neural Network, and begin running simulations. Unfortunately, I am running about one day behind schedule, as processing the expert-level games dataset into consumable board states took longer than expected. However, I have a baseline version of the value network set aside for the interim demo, and am finishing up the design exploration as we speak, meaning if a better model is trained between now and Monday I can replace the already competent baseline.

That being said, I have not fallen very far behind at all, and it is easily covered by the slack built into my schedule. However, there are a few things of note before I start simulation proper, the first being ECE Machine setup. For the preliminary value network, I trained locally as the training data I generated takes up roughly 40 GB of space, well above my AFS limit. However, locally I am also limited by 8 GB of RAM meaning I can only use about 7.5 GB of this training subset anyway. As such, even if I cannot port all 40 GB of data onto the ECE Machines, anything over 8 GB would be an improvement, and worth trying just in case it helps train a substantially different model. As such, I am planning on asking Prof. Tamal on Monday who I should ask about getting my storage limit increased, and I will work on it from there.

The design space exploration has also yielded useful results in terms of what an allowable limit on network size would be. Locally, I’m currently operating with 2 convolutional layers, 1 pooling, and 1 fully connected dense layer, and this takes about 6.5 minutes per epoch with my reduced 8GB training set. The ECE machines will compute faster, and this 6.5 minutes per epoch rate is far shorter than my limit once we’re past the interim demo. This means if necessary, both the value and policy network architectures can grow without the training time becoming too prohibitive.

Therefore, beyond our interim demo, I plan to begin simulations next week to generate my first batch of policy-network-training and value-network-tuning data. Ideally I get the space increase on AFS quickly meaning I can do this remotely, but if possible I can run it locally as well, and port over the weights later. I also plan on setting up the architecture and framework for the policy network as well, so that I can begin training it as soon as the simulation data starts being generated.

Team Status Report For 11.4.23

A hardware risk that we are currently facing is the delay in parts and our delay of ordering them. With our delay of these hardware components, it’s put us behind in building and testing our code. This would push our completion date as well as possibly affect our LED development in the later half of the semester. As of right now at least, we plan to have a pre-made circuit design layout on the vector boards to speed the build process when our components arrive as well as start soldering resistors as required. This would at least not waste too much of our time and allow us to gain back some later.

One of the risks of site development is that people may not find specific parts of it not very intuitive to use. Some issues can seem a little nitpicky (one of the 2 tabs should be highlighted so that the user can easily tell which page they are on) but having an easy-to-use site makes the product more appealing. To mitigate this issue, we’ll find some sample users and have them test out the site and record their feedback on specific issues with usage on the site.

No changes were made to the existing designs of the 3 different components of the project (hardware, software, go engine).

However, there were some changes made to the schedule:

Hang’s Status Report For 11.4.23

This week, I spent some time making the UI look nicer and got the saved games to display on the site. I created a new page specifically for the saved games (all the routing done with react-router) and added the necessary button and dropdown menu for uploading a file and choosing a move number. I also spent some time making the board interactable specifically for the interim demo because we haven’t integrated the different parts yet, so after the board, go engine, and site are all integrated, I will remove this feature. 

My progress is currently on track. By next week, I will set up a local python server specifically for running the python scripts for the go engine, so that the frontend can make a request for getting the recommended game moves, and the server will return a response containing the recommended moves.

Israels status report 8.29.2023

Tasks completed

I have finished the laser cutting for our board as well as software expected for our initial circuitry. I finished troubleshooting and testing a small section of our circutry for our light sensors and have order new parts for us.

Progress status

Even though I have tested our circuitry at a smaller scale, I need to repeat a similar process for different light sensors. This means I am falling slightly behind but I can at least start on the LED portions.

Tasks to complete

I need to finalize the internal layout of the board to support the vector boards near the top panel of the board. In addition, I need to adjust the software for this new light sensor we are using but that should be a quick task. In addition, I may start on integration or LED addition task for the time I wait for light sensors to arrive.

Team Status Report For 10.28.23

One of the risks that we are handling in the hardware development of the project is starting mid-size scale development of the circuitry without having tested a subset of the circuitry design. Some concerns regarding this is that our Arduino component, even though theoretically able to provide enough Current for our required sensor subset read time, the Current the Arduino has may not be substantial enough to provide for all the sensors at once. If this Arduino can not provide Current required, we will need to add more into our circuit than expected to limit current usage of each sensor.

 

For the secondary risk mentioned in last week’s team status report, we were able to find an open-source database with expert Go games; this fully mitigates this risk from last week. The games are stored in a file format called SGF, or smart game format, where game moves are stored in a tree which allows for variations from the main line of gameplay. This database has over 60000 go games, and the go games are split into different categories.

For hardware, we have changed our usage of photodiode sensors to photo resistors with an additional static 1MOhm resistor in series. This decision was required due to unexpected value results not being a high enough range. This along with the component requiring an additional resistor to restrict Supply Current made our usage of this sensor impractical and even more expensive than other components. The new photo resistors are very simplistic in characteristic, cheaper, and will theoretically be able to provide a larger range of light values with our 1MOhm resistor we have chosen in series. In addition, we have added more parts to the physical board design, as we will require an additional platform in the physical board to support the multiple mini vectors to be held near the top of the board’s holes.

 

Besides the hardware design change, there are no design changes to the reinforcement learning side and the software side. The development of the reinforcement learning model is ahead of schedule. The development of the software side is a little behind schedule, but the work can easily be caught up in time.

Updated schedule:

Physical board assembled and subcircuit testing breadboarded.

General vector board placement internal view