Team Status Report for 4/26

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

  • The most significant risk that could jeopardize the success of the project is the reliability and quality/durability of the pogo pins. We discovered that some of the individual pegs of the male pogo pins become permanently partially depressed (i.e. the spring mechanism degrades and doesn’t push the pogo pin up to full height) perhaps during soldering. This results in a bad connection between the grid and the block no matter how hard we try to align the 2 pogo pins, thereby prohibiting us from playing the game at all since we absolutely need the pogo pins to get that wired connection for our UART communication. To manage this risk, we have a few extra male pogo pins on standby (pre-wired) in case something goes wrong or one of the grid pogo pins degrades. However, we have found that pushing the male pogo pins up from the grid more and pushing the block’s female pogo pin out more can help attain a good enough connection for these defective pins, so our contingency plan if we cannot replace a pogo pin is to rework the grid/block construction slightly to allow these pogo pins to stick out a bit more. For now, it seems like the connections are working pretty well and are relatively stable after we swapped out the bad pogo pins.

 

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

  • Yes, we had to supplement the Merriam-Webster hint retrieval with generated output from Google’s Gemini LLM since Merriam-Webster failed to retrieve definitions for a small percentage of the words in the dataset. This was necessary because there were some games wherein more than 1 word had no hints retrieved at all, which is extremely detrimental to our primary use case of making the game a better experience for novices/ELLs who rely on the hints to gain more background knowledge about specific English words. The cost of this change was in terms of latency. Using Gemini boosted our hint retrieval rate to 100%, and the LLM hints were generally more readable/distinct than the Merriam-Webster ones (i.e. improved quality). However, it is about 15x slower compared to Merriam-Webster. To mitigate this latency cost, we only retrieve hints from Gemini when no definitions or context are fetched from Merriam-Webster. Later on, we decided to pre-generate all the Gemini hints and store them in a local database, so that during program execution we could just read this database instead of making an API call. We then experimented with extending this technique to the whole NYT dataset, pre-generating Gemini hints for all words and storing them in a giant JSON file that we can just read from during program execution. This makes the hint retrieval O(1) and has the added benefit of readability from the LLM output.
  • We decided to add sound effects and colors to the web app to draw the user’s attention to it during the answer-checking stage. This was necessary because during user testing, some users would be too focused on the physical blocks during answer checking and wouldn’t look at the screen to see if they were correct, incorrect, or one away. The cost of this change was not much in terms of latency; we just had to write more code.
  • We decided to add a “loading” page for the answer checking stage since the latency from button push to transitioning to the appropriate result screen was a bit long, so the user couldn’t tell if their button push was actually registered or not and would sometimes press the button again, resulting in a erroneous double submission. The cost of this change was not much in terms of latency; we just had to write more code.
  • We enhanced the “missing block” feature to display on the web app the specific positions of the blocks that are missing or have a bad connection, so that the user can manually fix these before resubmission. This was necessary because the alignment of the pogo pins between the block and grid was quite difficult for users to get on the first try since we had not implemented magnets on the blocks/grid, so if they were misaligned we would have to look ourselves on the backend to see the indices of the bad blocks. The reason we did not implement magnets is because each grid and each block’s pogo pins are slightly different in terms of positioning, so we were having a difficult time coming up with a way to get the magnet placement to be interchangeable between all the blocks and grid positions. The cost of this feature was having to rework the code a bit to relay this information, and in terms of latency we had to query all the blocks in the grid/row before returning to the web app instead of just stopping at the first failure, so the latency was slightly increased. To mitigate this, we decreased the delay from 0.5 to around 0.1-0.3 seconds for the UART querying, which made our UART communications much faster. 
  • We want to have our row LCDs be able to “scroll” so in case a category name is too long to fit on the whole row LCD, it doesn’t get cut off and the user can still see it. This is needed since the row LCD is the only way the user can see the category name if they guessed correctly.
  • We added a “super hints” feature to help users who were really stuck, even with the provided hints. This super hints feature gives 1 word in each category. We thought this would be necessary to help make Connexus a less frustrating experience for players compared to the original game. 
  • We added an “easy mode” feature that is only 8 words and 2 categories. This mode is useful to novice players since it is much easier to win the game this way, and helps the user to get in the right mindset that is needed for the normal game without all the frustration and difficulty of the full 16 word mode.

Provide an updated schedule if changes have occurred.

  • All the changes except have been implemented, tested, debugged, and integrated into our product. Thus, the updated schedule just includes us working on potential aesthetic enhancements like 3D printed frames for the block LCDs, a custom housing for our speaker, and neopixels for the blocks.

This is also the place to put some photos of your progress or to brag about a component you got working.

  • Please see our photos and videos here.

List all unit tests and overall system test carried out for experimentation of the system. List any findings and design changes made from your analysis of test results and other data obtained from the experimentation.

  • Hint retrieval quality test
    • Findings:
      • Merriam-Webster does not retrieve any definitions/contexts for a word 1.57% of the time
      • There are 5.37% of games w/ >1 missing word (“bad” game)
    • Design Changes:
      • Supplement hint retrieval with LLM API calls
      • Supplementing w/ Gemini, we are able to retrieve hints 100% of the time
  • Hint retrieval latency test
    • Findings:
      • Merriam-Webster has a latency of 0.11s per word
      • Gemini Flash 2.0 has a latency of 1.5s per word
      • [Merriam-Webster] Average “good” game latency = 1.81 sec
      • [Merriam-Webster + Gemini] Average “bad” game latency = 4.28 sec
      • Average game latency = (1-(33/614))*1.81+(33/614)*4.28 = 1.94 sec
      • We will meet our 3.2 sec latency requirement ~95% of the time
    • Design Changes:
      • We pre-generate all the Gemini hints and store them in our own local database so that we don’t have to make API calls live when the program is running, we can just fetch the hint from our dataset.
  • Game logic accuracy test
    • Findings: Answer checking logic was 100% accurate
  • User testing
    • Findings:
      • More often than not, some blocks would be disconnected or misaligned, resulting in missing ACKs during UART communication. The user would not be able to see which blocks were being problematic.
      • Users wouldn’t realize the web app is showing the result of their submission and would miss it.
      • The puzzle was sometimes too hard even with hints.
    • Design Changes:
      • Displaying the positions of the missing blocks on the web app.
      • Colors and sounds during answer checking.
      • “Super hints” for each puzzle which gives 1 word in each category.
  • Battery Life of Blocks
    • Findings: Each battery lasts around 6 hours of continuous use
  • Weight and Size of Blocks
    • Findings:
      • Each block weighs around 181g and is 3.3” x 3.3” x 3.3”
  • UART Tradeoffs during upload_words
    • Delay (latency) vs Accuracy
      • Findings: Ideal combination was at 0.2s for total upload latency of 3.2s but 100% accuracy
    • Baud Rate vs Accuracy
      • Findings: 
        • Baud rate > 115,200 are able to received within one try (0.2s delay)
        • Baud rate < 115,200 received within 2 tries (0.4s delay)
  • UART Send & Receive
    • Findings: 
      • 100% accuracy
      • 200ms latency on average for each word
  • Answer Checking E2E
    • Findings:
      • Average incorrect latency: 0.9419006s
      • Average correct latency: 3.0390748
      • Average latency: 1.9904877s
      • Average accuracy: 97.50%
        • Sometimes accidental double submissions or failed transitions that cause inaccuracies.
        • Sometimes the category display on a row LCD failed.
    • Design changes:
      • Added a “get status” function to every web app page that involves gameplay so that we are guaranteed to transition to win/lose if game end conditions are satisfied no matter what state.
  • Word Upload
    • Findings
      • Average latency: 3.231673384s
      • Average accuracy: 97.50%
    • Design changes:
      • Added the position reporting of the missing block to the frontend.

Leave a Reply

Your email address will not be published. Required fields are marked *