Shannon’s Status Report for 11/30/24

This week, I focused on finishing up the TTS feature on the robot. Since the feature works well on the WebApp, I decided to integrate it fully with the robot’s speakers. I first ensured that the user’s input text could be sent properly via WebSockets to the robot. Once this was achieved, I then used the Google text-to-speech (gTTS) library on the RPi and had it translate the text into a mp3 file. Then, I tried to have the mp3 file play through the speakers. On my personal computer (a macbook), the line to play audio is os.system(f”afplay {audio_file}”). However, since the RPi is a linux system, this does not work and I tried using os.system(“xdg-open {audio_file}”) instead. This allowed the audio file to be played, but also opened up a command terminal for the VLC media player, which is not what I wanted, since the user would not be able to continue playing audio files unless they quit the terminal first. Thus, I had to look up ways to play the audio file and this led me to using os.system(f“mpg123 {audio_file}”). It worked well, and was able to play the audio with no issues. I timed the latency and it was able to be mostly under 3s for a text with word length of 50 words. If the text was longer and was broken into 50 word chunks, the first chunk would take slightly longer, but the subsequent chunks would be mostly under 2.5s which is in line with our use case and design requirements. With this, the text-to-speech feature is mostly finished. There is still a slight issue where for a better user experience, I wanted the WebApp to display when a chunk of text was done reading, but the WebApp is unable to do so. After some debugging, I found that it was because the WebApp tries to display before the WebSocket callback function has returned. Since the function is asynchronous, I would have to use threading on the WebApp if I still want this display to appear. I might not keep this slight detail because introducing threading could cause some issues and the user should be able to tell when a chunk of text is done reading by the audio itself. Nevertheless, the text-to-speech feature now works on the robot, the user can input a .txt file, the robot will read out the first x number of words and then when the user clicks continue, reads out the next x number of words and so on, so I think this feature is final demo ready.

 

According to the Gantt chart, I am on target. 

 

In the next week, I’ll be working on:

  • Helping Jeffrey finish up the Study Session feature
  • Finishing up any loose ends on the WebApp (deployment, code clean-up, etc.)

 

For this week’s additional question:

I had to learn how to use the TTS libraries such as pyttsx3 and gTTS. I thoroughly reviewed their respective documentation at https://pypi.org/project/pyttsx3/ and https://pypi.org/project/gTTS/ to understand how to configure their settings and integrate them into my project. When debugging issues, I relied on online forums like Stack Overflow, which provided insights from others who encountered similar problems. For example, when I encountered the run loop error, I searched for posts describing similar scenarios and experimented with the suggested solutions. It was there that I saw someone recommending gTTS instead, saying how this issue would be prevented because unlike pyttsx3, it does not use an engine and relies on converting text to mp3 files first and then playing instead of converting and playing as it went. This allowed me to switch over to gTTS, which was what we used in the end. 

I also had to learn WebSockets for real-time communication between the RPi and the WebApp. I read through the documentation online at https://socket.io/docs/v4/ which was great for understanding how the communication process worked. It also taught me how to set up a server and client, manage events, and handle acknowledgments. For debugging, I used tools that I had previously learnt in other classes, such as the Chrome browser developer tools console and the VSCode Debugger with breakpoints and logpoints, which allowed me to diagnose CORS issues and verify if events were being emitted and if the emitted events were being received through the logs/erros diplayed.

Shannon’s Status Report 11/16/24

This week, I worked on WebSockets with Jeffrey during our Tuesday and Thursday meet-up. Initially, I spent some time helping Jeffrey set up his virtual environment, ensuring that he had access to our GitHub repository, and that ultimately he was able to run our WebApp on his computer so that he could test the DSI display showing correct information based on the WebApp inputs. Jeffrey later ran into some git commit issues that I also worked with him to resolve (he had accidentally committed the virtual environment folder to our GitHub repository, resulting in more than 1 million lines of code being committed and causing him to be unable to git pull due to the sheer volume of content). Unfortunately, as of right now, we are still running into some issues trying to use socket.IO to have the WebApp communicate with the RPi and display. Previously, we were able to communicate with just the RPi itself, however, when trying to draw up the DSI display using Tkinter, Jeffrey ran into issues with trying to communicate between the WebApp and the RPi. While WebApp to RPi communication could work, it did not work the other way around and he is still working to resolve this issue. As such, although I was hoping to be able to test the WebApp display based on the RPi-sent information, since this communication from the RPi to the WebApp is still buggy, I was unable to do so. Hopefully Jeffrey is able to resolve this issue in the next week, and I will be able to more thoroughly test the code I have written. 

 

I have also worked on improving the latency of the TTS feature. Previously, there was an issue where upon a large file upload, the TTS would take a long time to process the information before speaking. As such, I have changed the TTS interface to include an option for the user to choose how many words they want in a “part”. If a user inputs 50 words, when they click “Start Reading!”, the input .txt file is processed, the text is split into 50 word parts, and the first 50-word part will be read. After reading, the website will display “Part 1 read successfully” and a new button will appear, saying “Continue?”. If the user clicks on it, the next 50-word part will be read. Once all parts have been read, a message reading “Finished reading input text, upload new .txt file to read new input.” will appear, and the “Continue?” button will disappear. 

User can upload a .txt file, default word count is set to 50 (latency of ~2s,)

Reading a 50-word text with the maximum word limit set to 25 words (should read in two parts).

After the first part/25-word chunk is read successfully, continue button appears:

After second part is read successfully, continue button disappears and the finished message appears.

Lastly, this week I also worked on a slight UI improvement for the website. When a user ends a study session, there is now star confetti! (Similar to Canvas submissions).

According to the Gantt chart, I am on target and have finished all individual components that I am responsible for. All other tasks that I am involved in are collaborative with either Jeffrey (WebSockets – RPi receiving and sending info) or Mahlet (TTS on Robot) being in charge.  Although slightly behind on what I initially planned for the interim demo (Study Session not fully working), everything else I had planned to talk about is working.

In the next week, I’ll be working on:

  • Helping Jeffrey finish up the Study Session feature
  • Helping Jeffrey to start RPS Game feature
  • Implementing TTS feature on the Robot with Mahlet

Shannon’s Status Report 11/9/2024

This week, I worked on ensuring that Study Session information could be sent via WebSockets, and I managed to succeed in doing so. The WebApp can successfully send over information when the user creates a Study Session, and it can send over information that the user has ended a Study Session. As for robot to WebApp communication, because the pause button on the robot for Study Session has not yet been tested and implemented, I have not yet tested if the code I wrote for upon receiving such an input through WebSockets works yet. Theoretically, upon the pause button being pressed, the RPi should send a message via WebSockets through something like socket.emit(“Session paused”), and upon receiving such a message, the WebApp display page will show “Study Session on Break” instead of “Study Session in progress”. Ideally, I wish to test this with the actual pause button being pressed on the robot, but if Jeffrey runs into some issues with implementing that in time, I will test it by just sending the message 10 seconds after the RPi receives the start session information by default to see if the code I have written actually works. In conclusion, WebApp to robot communication is working (Figure 1), robot to WebApp communication needs testing on the WebApp end.

Figure 1: RPi receiving Study Session information from the WebApp.

I also worked on the WebSocket code for the RPS Game this week. Unlike the Study Sessions, there is significantly less communication between the WebApp and the robot and as such, I only worked on this after I was confident about WebSockets working for our Study Session feature. For the RPS Game, all the WebApp has to do is send over information at the start of the game with regards to the number of rounds they wish to play, and then all gameplay occurs on the robot with RPi, DSI display and the RPS game buttons. When the game ends, the robot then sends back game statistics via WebSockets, which gets displayed on the WebApp. I am able to send the RPS Game information to the robot with no issue, but I have yet to test the receiving of information and the display that should occur, which I will focus on next week. Same as before, WebApp to robot communication is working, robot to WebApp communication needs testing.

For TTS feature, I didn’t have as much time to work on it this week, but I have managed to implement the reading of a .txt file instead of just inputting text into a text field! A user can now upload .txt file and gTTS is able to read the text of the .txt file.

Lastly, this week I also worked on the overall UI of our page to make sure that everything looks neater and more visually appealing. Previously, the links were all connected together and messy, but I have separated the header bar into individual sections with buttons and so the overall UI looks more professional. I will continue to work on improving the overall style of our website, but now it more closely resembles the mock-ups that I drew up in the design report.

According to the Gantt chart, I am on target.

In the next week, I’ll be working on:

  • Finishing up the Study Session feature
  • Finishing up TTS feature on the WebApp
  • Testing RPS Game statistics display on the WebApp
  • All Interim demo goals are listed under Team Report!

Team Status Report for 11/09/2024

Currently the biggest risk is to the overall system integration. Shannon has the WebApp functional, and Jeffrey has been working on unit testing individual parts of code such as RPS/DSI display. We will have to work on ensuring that the overall process is smooth, starting from ensuring the inputs from GPIO pins on the robot can be processed by RPi and then that the relevant information is sent to the Web App accordingly through WebSockets (so we can record information such as rock paper scissors game win/loss/tie results) and then that the WebApp displays the correct information based on what it received through WebSockets.

We will also need to perform some latency testing to ensure that this process is happening with little delay. (e.g. pausing from the robot is reflected promptly on the WebApp – WebApp page should switch from Study Session in progress to Study Session on break page almost instantly). 

Due to the display screen to RPi ribbon connector’s length and fragility, we have decided to limit the neck rotation to a range of 180 degrees. In addition, translational motion is also limited because of this. Therefore, by the interim demo, we only intend to have the rotational motion, and depending on the flexibility of the ribbon connector, we will limit or get rid of the translational motion. 

Interim demo goals:

Mahlet: 

  1. I will have a working audio localization with or close to the 5 degree margin of error in simulation. 
  2. I plan to have the correct audio input signals in each microphone, and integrate this input with the audio processing pipeline in the RPi.
  3. I will integrate the servo motor with the neck motion, and make sure the robot’s neck motion is working as desired.
  4. I will work with Shannon to ensure TTS functionality through gTTS and will do testing on pyttsx3 directly from RPi. 

Shannon: 

I aim to have the Study Session feature fully fleshed out for a standard Study Session, such that a user can 

  1. Start a Study Session on the WebApp (WebApp sends information to robot which starts timer)
  2. Pause it on the robot (and it reflects on the WebApp)
  3. When goal duration has been reached, the robot alerts WebApp and WebApp displays appropriate confirmation alert 
  4. User can choose to end the Study Session or continue on the WebApp (WebApp should send appropriate information to RPi) 
    1. RPi upon receiving information should either continue timer (Study Session continue) or display happy face (revert to default display)*
  5. At any point during the Study Session, user should also be able to end the Study Session (WebApp should send information to RPi)
    1. RPi upon receiving information should stop timer and then display happy face (revert to default display)*

* – indicates parts that Jeffrey is in charge of but I will help with

I also plan to have either the pyttsx3 library working properly such that the text-to-speech feature works on the WebApp, or have the gTTS feature working with minimal (<5s) processing time by pre-processing the user input into chunks and then generating mp3 files for each chunk in parallel while playing them sequentially.

For the RPS Game feature, I aim to ensure that the RPi can receive starting game details from the WebApp and that the WebApp can receive end game statistics to display appropriately.

Jeffrey: 

The timer code is able to tick up properly, but I have to ensure that pausing the timer (user can pause timer using keypad) is synced with WebApp. Furthermore, the time that the user inputs is stored in the Web App in a dictionary. I currently have code that is able to extract the study time from the duration (key in dictionary), and passes that into the study timer function, so the robot can display the time counting up on the DSI display. One mitigation is that we have the pause functionality on the DSI display itself, as opposed to GPIO input -> RPi5 -> WebApp. By using the touchscreen, we decrease reliance on hardware and makes it easier to debug via Tkinter and software.

 

RPS code logic is functional, but needs to be able to follow the flow chart from design report to go from confirm “user is about to play a game” screen -> display rock/paper/scissors (using Tkinter) -> display Win/Loss/Tie screen, or reset if no input confirmed. Our goal is to use the keypad (up/down/left/right arrows) connected to RPi5 to take in user input, and output the result accordingly. One mitigation goal is that we can utilize the touchscreen display of the DSI display, to directly take in user input on the screen to send to the WebApp. 

Integration goals: 

  1. The TTS will be integrated with the speaker system. Mahlet and Shannon are working on the TTS and Jeffrey will be working on outputting the TTS audio through the speaker. 
  2. For the Web App, Jeffrey needs to be able to take in user input from the Web App (stored as json), parse it, and send inputs to functions such as timer counting up, or the reverse, where a user action is sent to the WebApp i.e. user chose rock, and won that round of RPS. 

 

There have not been any changes to our schedule.

Shannon’s Status Report for 11/2/24

This week, I focused on the TTS feature of our robot. I spent some time trying to use gTTS (Google Text to Speech) on our WebApp, and it worked! We were able to take in input text on the WebApp and have it read out by the user’s computer. However, there is a significant issue that gTTS has, which is the latency of the feature. gTTS library works by converting all text to an mp3 file, and then the WebApp plays the mp3 file for the user. The problem thus arises when a long piece of text is used as input. The specific details are also mentioned in the Team Weekly Report, but essentially the delay can be as long as 30s for a piece of long text to be read. As such, this is definitely a concern for our project. Our previous library, pyttsx3, translates as it processes text and as such there was no scaling latency associated with it, unlike gTTS. Me and Mahlet have agreed that we will still try and get pyttsx3 to work to avoid this significant latency issue from gTTS, and if we still can’t get it to work by the end of next week, we will switch to using gTTS and possibly split up the input text into 150-200 word chunks and then have multiple mp3 files be generated and then played back-to-back.

I also worked on the WebSocket code for the Study Sessions this week. Following our success in having the RPi communicate and respond with our WebApp, this week I have written some code on our WebApp Study Session feature to have it send over information about the Study Session when it is created to see if the RPi can receive the Study Session information and not just information about whether a button was clicked. Unfortunately, I have not had a chance to test this out yet on the RPi, but I have confidence that it will work. I have also written some code in preparation to be added to the RPi to see if the WebApp can receive information about paused Study Sessions that I plan on transferring to the RPi when I am next available to work on it. Ideally, by the end of next week, all communications between the RPi and the WebApp will be working enough to simulate a study session occurring. 


In the next week, I’ll be working on:

  • Researching a solution for pyttsx3 with Mahlet
  • Study Sessions communications between the WebApp and RPi through WebSockets
  • Starting Study Session timing code on the RPi

Shannon’s Status Report for 10/26/2024

This week, I focused on making WebSockets communications between our WebApp and the RPi work. When we met up on Thursday afternoon, Mahlet and Jeffrey helped to set up the RPi (registering the device, connecting it to WiFi, etc.). Once we were able to download VSCode on the RPi, I coded up a short script to test if communications were able to happen. I wrote a simple script in JavaScript on the RPi, and then wrote a similar one with some extra UI features on the WebApp and tested it out. Theoretically, when I clicked a button on the WebApp, the RPi should receive it and print out a message. Initially, this wasn’t working due to the RPi and the WebApp being on different ports. There was a CORS(Cross-Origin Resource Sharing) error, due to the WebApp trying to send a request to a different domain than the server that was hosting it and so to debug this, I included some CORS settings on the RPi side to allow the WebApp to send a request. This worked, and the RPi was able to display a message when a button on the WebApp was clicked.

On the WebApp:


On the RPi:



I also spent quite some time on trying to incorporate TTS on the WebApp itself this week. Unfortunately, the pyttsx3 library that we were trying to use seems to not work well with our website. After coding up some simple logic to use the TTS function in the library when a user input is received, me and Mahlet tried testing it to see if it was successful. When we first input some text into the textbox and click the read button, it works well and the laptop speakers play the correct audio with little to no delay. However, when we try to send more text again, we get the error “run loop has already started”, which indicates that the previous text to speech command queued had not finished. We were confused and spent quite some time trying to debug this by looking up solutions online that other users who have encountered this issue tried and it did not work for us. We looked through the documentation for the TTS library itself and tried out various functions, but nothing seemed to work. Thus, me and Mahlet are looking into using other TTS libraries to see if we can find a solution to this. I am considering using gTTS (Google Text to Speech), which is not as ideal as pyttsx3 because it requires an internet connection, but should be well-documented enough to reduce the chances of it not working as well.

In the next week, I’ll be working on:

  • Building the robot with my team
  • Researching different solutions for TTS with Mahlet
  • RPS game function on the WebApp with WebSockets