Nathan’s Status Report For 10.7.23

Now that we have fully settled on our transition to Go, and how it will be implemented,  I have been able to fully focus on my part of the project. Due to our team presenting our design on Monday, and my conclusion of research last week with AlphaZero and MuZero I had the rest of the week to focus solely on implementation.

Almost all of my time this week was spent on the Go simulation framework, as well if beginning to figure out how to set up the reinforcement learning architecture. With regards to the former, I worked together with Hang to make sure we have a clear plan on how to pass board information from Arduino to my backend engine. From there, I was able to implement a backend representation of the board, and implemented function to allow an outside controller (in this case the engine) to make a move for simulation purposes, as well a functions to update position based on the information conveyed by the physical board. This is effectively all I need (along with basic rule checking like making sure the game is or isn’t over which I have also implemented) to move on to the reinforcement learning architecture. The real challenge here is the custom loss functions defined in our design proposal (expected result optimization and policy vector normalization). I have never worked with custom loss functions in python before, so I’ve done a huge amount of research into different ways to accomplish this. I decided to settle on PyTorch, as this is not only the current industry consensus for best deep learning framework, but also extremely well supported in Python. I started, but have not completed, actually scripting these loss functions, I am taking my time to make sure they are not only correct, but also optimally efficient, as in conjunction with the MCTS simulation, training times could balloon rapidly with inefficient implementation for either of these.

In the next week, I plan to finish these custom loss functions, then work on getting the in-training MCTS simulations to work. With the simulation framework already built, this shouldn’t require too much time.

Team Status Report For 10.7.23

The major risk that we are taking as of the moment is spending more time on the implementation of the board than expected. With less support from Techspark than initially expected, it has led us to focus on building a board of our own. To ensure this does not put use behind schedule, we plan to dedicate more time next week in catching up. If the Board implementation does require more time to build, which is not planned nor expected, we will be sure to look for external resources that may help us build the physical board if not look for possible substitutes to making a custom board.

We currently have no changes in our design other than our shift of projects from the earlier week.

Schedule update:

One principle of engineering that was used is modularity. One specific example in the site design would be the code for visualizing the go boards. Go boards are typically 19×19, but the code is made modular such that it can visualize up to any NxN board size. We attempted to make our code as modular as possible because modularity typically reduces complexity and makes parts more reusable.

In addition, we have used skills from Electrical Engineering and electro-magnetic physic principles when deciding on components to use as many components have limitations. Such principles applied would be Ohm’s law and KCL rules for our simple circuitry design as well as knowledge on components characteristics.

With the custom board development, one of the concerns we dealt with as an engineer was dimension requirements and constraints of our box that would hold our electronics. This board development required skills in product engineering to make sure the board was easy for users to use as well as safe for users.

Hang’s Status Report For 10.7.23

Instead of focusing on game history, I decided to flip the order and focused on visualization of the game board first. This ended up being a little more tricky than I thought because I originally planned to draw the board with divs as tiles, but while I was working through it, I remembered that the pieces fall on the intersection of the lines, not in-between the lines and there was no good way of drawing pieces slightly off of a div without messing up the entire grid (React expects children to fall inside the parent containers). Instead what I did is that I had each “tile” fall under 9 categories: top left corner, bottom left corner, bottom right corner, top right corner, top, left, right, bottom, and middle. This way, I can place the pieces inside of each tile, instead of attempting to put the pieces in the intersection of 4 tiles. 

My progress is currently on schedule, I’ve just flipped the order, so that I worked on the visualization of the board, instead of game saving first.

By next week, I expect to have game saving done, and also start to work on game loading.

Team Status Report For 9.30.23

The major risk we are currently facing is making sure we successfully transition our project’s goals and requirements without losing a large amount of progress. As we will go into more below, we are transitioning our project from a website to play Mancala on combined with an engine, to a physical Go board, that displays engine recommendations and stores game histories locally for further analysis. This now creates a hardware (physical board) requirement for our project. Fortunately, a large amount of our research is applicable to the new project, and some of the software we had already begun to write can be adapted to meet our new needs. Nevertheless, we need to make sure we are not over-committing, and have a plan to catch up on the small amount of time now lost. To do so, we have defined a new MVP, adjusting our human-computer interaction requirements, while also making plans to adapt as much of our pre-existing work as possible, and adjusted our schedules (condensing some earlier steps) so that while we will have to commit to extra work for a few weeks, we will not be in a crunch at the end.

As mentioned above, we have made monumental changes to our project and its design. After receiving helpful feedback from TAs and students in regards to our proposal, we realized that our use-case was not strong enough and our project did not have the requisite breadth. These two major flaws led us to change our project focus to Go instead. Due to the competitive gaming community there is much more demand for a Go training product, and the Go equivalent to a chess DGT board (which is one of the services our project will provide) has not been created. This switch will also incorporate a hardware component that record’s player’s games and allows our website component to show analysis for these already played games. These changes have forced us to rearrange our schedule a bit (as seen below) but that, combined with the other mitigating actions we took (as described above) will allow us to stay on target.

New Schedule:

Hang’s Status Report For 9.30.23

After our proposal, we decided to switch our project so that we could cover 3 ECE areas instead of 2 in case the machine learning component of our project fails. With our new project, my role changes slightly. I’m still working on the site (solely working on the site since Israel will now work on the go board which is our embedded system), but we decided that we don’t need to host our site. Instead, we will just locally host our site.

With the change in our project, we won’t need a dedicated backend server, so we won’t need AWS and most of the development will just be with React and Javascript. I got started on setting up this React web-app: installing Node and then creating the web-app. Besides working on setting up the initial site, this week was spent going over the different project ideas after we got our feedback from the proposal and then working on the design presentation once we finalized our project idea.

The schedule is on track, and by next week, I expect saving game history to be finished, and I’ll start to work on the visualization of saved game history.

My role on this project is entirely software, but most of the software classes I took here are lower-level systems courses, and this project is higher on the abstraction level. One class that would be relevant is 15-122 since I may use some data structures for the site design. For example, for the visualization of the game history, I want to show the board state for a specific move, so I’ll store the game states of that specific game into a hash table with the move number being the key and the game state being the value.

Nathan’s Status Report For 9.30.23

As is mentioned in our team status report for the week, we have transitioned our project away from Mancala and into the game Go instead. Fortunately, the division of labor remains relatively similar and I am still broadly responsible for the training of a reinforcement-learning engine that will eventually be used to give move suggestions and positional evaluations to our users.

Accordingly, almost all of the research I did last week is still applicable, as the same self-play techniques can be used, and, in fact, have been proven to work in the cases of  AlphaZero and MuZero. After making the transition to Go this week, I had to do a quick catch-up on the rules and gameplay, but after that, along with the two above-linked papers, the research phase of my project has come to a close.

Of course, with the design presentations coming up next week, a good amount of my time this week was devoted to preparing for that as well, and the rest was spent building the platform for the reinforcement learning. The current consensus for optimal Go engine creation is a combination of deep learning and Monte Carlo Tree Simulations (MCTS). MCTS works by using self-play to simulate many game paths given a certain position, and choosing the move providing the best overall outcome. I have started work on creating the framework to perform these simulations as quickly as possible (holding game state, allowing the candidate engine to make moves against itself and returning the new board, etc.).

With regards to classwork helping me prepare for this project, I think the two ECE classes that helped the most are 18213 and 18344. I have not taken any classes in reinforcement learning or machine learning in general, but the research I did in the Cylab with ECE Prof. Vyas Sekar certainly helped me a huge amount, both in the subject matter of the research (deep learning) and the experience of reading scholarly papers to fully understand techniques you are considering using. What 18213 and 18344 provided was the “correct” way of thinking about setting up my framework. I need my simulations to be as efficient as possible while also maintaining accuracy, and I need my system to be as robust as possible, as I will need to make frequent changes, tuning parameters, etc. These combined with the research papers read last week, and the two above papers are what influenced my portion of the design the most.

Finally, in the next week I plan to finish the Go simulation framework, and begin work on setting up the reinforcement learning architecture, to begin training in the week after. MCTS simulation is quite efficient, but with the distinct limit on computational resources I have, allocating proper time is vital.

Israel’s Status Report for 9.23.2023

Tasks accomplished

I have started the ramp up for Java Script and React usage with informational videos. Some of these videos from Mosh Hamedani (Older but more thorough use of react 1) (Newer tutorial on how to use react 2). I followed these videos aswell to practice using React in preparation. I have also looked into documentation for java-script itself with Mozilla with helpful functional usage.

In addition, I looked for some UI based libraries and packages to use with React that might be helpful. One of the ones of focus is BluePrint due to its very well-made documentation and customization integrated with CSS that might prove beneficial in the future with my prior experience with CSS. Other ones of interest that might be used are as followed:

Progress status

Finished ramping up on Javascript and and React usage for this weeks plan.

Tasks to complete

I plan to quick overview on CSS just to be more familiar with the format as well as HTML in case it proves useful in the future.

Websocket familiarity is a number one prioity for my interface with backend. I plan to ramp up on websocket usage and knowledge way more.

In Addition, I plan to start designing the Mancala frontend basic pages and components. Initially start making a interface and blueprint of planned functions and pages. I plan to use my framework from my learning rampup aswell for my codebase.

If everything turns out well and plans don’t stray off, I plan to have a codebase, all my TODOs and file locations, organized for implementation to start smoothly .

Team Status Report for 9.23.2023

One of our biggest risks currently is the possibility that the planned minimax strategy will not prove effective as an initial opponent for the self-play RL model (be it too strong or too weak). If it is too weak, the platform that we are building it on can be extended to look more than 2-ply into the future. While this will increase training time (as the calculations will take longer to compute) it will provide a stronger opponent. On the other hand, if it proves too strong, we have other, even more basic strategies waiting in standby, such as 1-ply maximization (just maximize the amount of stones captured in one move, ignoring the possible responses) or even a random agent.

With regard to changes, a possible problem pointed out during the presentation was the idea that some variants of Mancala were solved. While we had always planned on this, we had not made clear that the version we were building for our website was an unsolved variant (the seven stone variant of Kalah Mancala). Some players use other ways to get around the solved aspect of the game such as switching positions after the first move, but those add unnecessary complication to the game, raising the barrier for entry, especially for younger players. This will not cause any increase in price, or changes to the system itself, but does specify requirements a bit better. Other than that there have been no changes to the system or structure of the project.

For right now, everyone is on schedule, so no changes are necessary there.

The effect our project will have on public safety, the economy, or the environment are relatively minimal. Of course, we are using a small amount of computational power on training the RL model and maintaining our servers, but in the grand scheme of things it is next to nothing. That being said, our project certainly has a non-trivial effect socially, and could possibly improve mental health for some users as well. Multiplayer games are inherently social, and an online platform for them provides an outlet for users to connect with other like-minded individuals. The fact that there is no major website dedicated to Mancala makes it all the more important. Beyond even meeting new people and possible friends, our project would also allow for friends to play each other directly, for friendships where it is difficult for the participants to see each other (long-distance, etc.) this can help strengthen them. Finally, this may only be relevant a tiny percentage of the time, but the small amount of social interaction from online gaming can make a significant difference in mental health. It is all too easy to shut yourself away and not interact with anyone, and as this effect compounds it becomes harder and harder to break out of it. Online social interaction can be a small step in the right direction, and our platform could provide that.

Nathan’s Status Report for 9.23.2023

This week I did a combination of research on reinforcement learning, and opponent/platform setup to enable the RL model training.

With regard to research, I want to understand as much as possible about reinforcement learning before I start the process of actually building a Mancala RL model. Through preliminary research our group decided that a self-play method of training would be best, so I read a number of papers and tutorials on both the theory of self-play RL and the logistics of putting it into practice in Python. A few of the resources I used are shown below:

OpenAI Self-Play RL

HuggingFace DeepRL

Provable Self-Play (PMLR)

Towards Data Science

Python Q-Learning

In order to train the self-play RL model, I must have a competent opponent for the model to start off playing against, before it can train against previous iterations of itself. If I choose too strong of a starting opponent, the model will not get enough positive reinforcement (as it will almost never win), and if I choose too weak of one the reverse is true. As such, we will start with a relatively simple minimax strategy that looks two “ply” (single player turns) into the future. However, to build this strategy, I need a platform for the game to be played on (so the RL model can play the minimax opponent). This week I started building this platform, programming all game rules and actions, and a framework where two separate players can interact on the same board.  I then implemented unit tests to make sure all game actions were functioning as they should. With this now in place, I have begun programming the minimax strategy itself. This means I am on schedule, and hopefully will have the minimax available to start training within the week.

Hang’s Status Report for 9.23.2023

Since I was the presenter for the proposal of our project, the first half of the week was dedicated to practicing the script.

After our project was presented, I put my focus into finding the tutorials that would be necessary for setting up the infrastructure of our project: namely how to set up AWS Lambda as our compute platform and how to set up a DynamoDB database. I came across this tutorial: Building a serverless multi-player game that scales | AWS Compute Blog (amazon.com), which seems perfect for our use-case. This tutorial builds a trivia game using Lambda functions with both http endpoints and websocket endpoints and uses DynamoDB tables as their database. They use Vue.js as their frontend, but we should be able to easily switch to React.

I can use this tutorial to set up the necessary infrastructure for our project and test out the endpoints that they have created. Once I get an understanding of how their code works, I can start working on our project’s backend, starting with the game logic.

The progress is currently on schedule. By the end of next week, the infrastructure should be set up, and I should be able to test communication between the demo’s frontend and backend. I should also start writing game logic for our Mancala gameplay.