TTS – Team C2: StudyBuddyBot

December 8, 2024

Shannon’s Status Report 12/7/24

This week, since I was done with everything I was in charge of, I focused my all into helping Jeffrey catch up on his parts. Jeffrey was able to have the Study Session start working, where when a user clicks start session on the WebApp, the robot display screen is able to start a timer. However, he was unable to get the pause/resume or the end session to work. As such, I sat down with him and we worked on these features. I noticed that he had a lot of debugging statements on the robot, which was good, but he wasn’t really logging anything on the WebApp. Following my advice, he did so and we realised that the event was being emitted from the RPi but it was not being received on the WebApp based on the logs. I also noticed that he had a log for when the WebSocket connected on the start but not on the in progress page and so I recommended that he add that. After he added that, I noticed that on the in progress page, the connection log was not appearing and so we waited until it appeared and then we tested the pause/resume button and it worked. Thus, we managed to debug that the issue had to do with latency. We then resolved this by switching the transport to WebSockets from Polling, and that fixed the latency issue. Then, we worked to debug why the End Session on the WebApp was not being received on the robot end. Once again, based on checking the logs, I deduced that it was probably because the events were being emitted wrongly and so the robot event handlers could not catch them. We tried fixing it to receive that event that was being sent by the WebApp instead and it worked – we finished a standard study session! This was done on Wednesday. After this, I took over all TTS and Study Session related work, but Jeffrey will work on the last part of hardware integration with the pause/resume buttons.

On Thursday, I then integrated my TTS code, and ran into some issues with the RPi trying to play audio from a “default” device and failing. After some quick troubleshooting, I decided to write a config file to set the default device to the port that the speaker was plugged into and it worked! The text-to-speech feature was working well, and there was barely any detectable difference in latency between the audio playing on the user’s computer and on the RPi through the speakers. I tested it some more to make sure it worked with various different .txt files, and when I was satisfied with the result, I moved on to integration with the Study Session feature. I worked on making sure TTS worked only while a study session was in progress (not paused). I wrote and tested some code but it was buggy and I could not get the redirection to work like I desired. I wanted it such that in the case that a Study Session had not been created yet, that the user clicking on the text-to-speech feature on the WebApp would redirect them to the create a new study session and if there was a Study Session that has been created but was on pause, it would redirect them to it. I also wanted to handle the case where if a user was using the text-to-speech feature, and the goal duration was reached, then the user going back to the study session would still trigger the pop-up to occur asking if they would like to continue or stop the session since the target duration has been reached.

On Friday, I continued to work on these features, and I managed to successfully debug them. The text-to-speech feature worked well alongside the Study Session and exhibited the desired behavior I wanted. I then worked on the Pomodoro Study Session feature, which implemented a built-in break for the user based on what they set at the start of the study session. I worked on the WebApp and the RPi, ensuring that the study and break intervals were sent correctly, and then to have it such that the study and break intervals worked, I had to create another background clock that ran alongside the display clock. The standard study session only needed a display clock since the user was in charge of pausing and resuming sessions, but since the pomodoro study session has to keep track of when to pause and resume the session automatically while sending a break reminder audio, it required a separate clock. I wrote and debugged this code and it worked well. This is also when I discovered that the text-to-speech feature could not occur at the same time due to the break reminder being played as well, and so I made a slight design change to prevent this.

Today, I will work on cleaning up the WebApp code, help Jeffrey with any RPS issues that he may face, and start on the final poster.

According to the Gantt chart and the newly linked final week schedule, I am on target.

In the next week I will work on:

Final overall testing with my group
Final poster, video, demo and report

November 30, 2024November 30, 2024

Shannon’s Status Report for 11/30/24

This week, I focused on finishing up the TTS feature on the robot. Since the feature works well on the WebApp, I decided to integrate it fully with the robot’s speakers. I first ensured that the user’s input text could be sent properly via WebSockets to the robot. Once this was achieved, I then used the Google text-to-speech (gTTS) library on the RPi and had it translate the text into a mp3 file. Then, I tried to have the mp3 file play through the speakers. On my personal computer (a macbook), the line to play audio is os.system(f”afplay {audio_file}”). However, since the RPi is a linux system, this does not work and I tried using os.system(“xdg-open {audio_file}”) instead. This allowed the audio file to be played, but also opened up a command terminal for the VLC media player, which is not what I wanted, since the user would not be able to continue playing audio files unless they quit the terminal first. Thus, I had to look up ways to play the audio file and this led me to using os.system(f“mpg123 {audio_file}”). It worked well, and was able to play the audio with no issues. I timed the latency and it was able to be mostly under 3s for a text with word length of 50 words. If the text was longer and was broken into 50 word chunks, the first chunk would take slightly longer, but the subsequent chunks would be mostly under 2.5s which is in line with our use case and design requirements. With this, the text-to-speech feature is mostly finished. There is still a slight issue where for a better user experience, I wanted the WebApp to display when a chunk of text was done reading, but the WebApp is unable to do so. After some debugging, I found that it was because the WebApp tries to display before the WebSocket callback function has returned. Since the function is asynchronous, I would have to use threading on the WebApp if I still want this display to appear. I might not keep this slight detail because introducing threading could cause some issues and the user should be able to tell when a chunk of text is done reading by the audio itself. Nevertheless, the text-to-speech feature now works on the robot, the user can input a .txt file, the robot will read out the first x number of words and then when the user clicks continue, reads out the next x number of words and so on, so I think this feature is final demo ready.

According to the Gantt chart, I am on target.

In the next week, I’ll be working on:

Helping Jeffrey finish up the Study Session feature
Finishing up any loose ends on the WebApp (deployment, code clean-up, etc.)

For this week’s additional question:

I had to learn how to use the TTS libraries such as pyttsx3 and gTTS. I thoroughly reviewed their respective documentation at https://pypi.org/project/pyttsx3/ and https://pypi.org/project/gTTS/ to understand how to configure their settings and integrate them into my project. When debugging issues, I relied on online forums like Stack Overflow, which provided insights from others who encountered similar problems. For example, when I encountered the run loop error, I searched for posts describing similar scenarios and experimented with the suggested solutions. It was there that I saw someone recommending gTTS instead, saying how this issue would be prevented because unlike pyttsx3, it does not use an engine and relies on converting text to mp3 files first and then playing instead of converting and playing as it went. This allowed me to switch over to gTTS, which was what we used in the end.

I also had to learn WebSockets for real-time communication between the RPi and the WebApp. I read through the documentation online at https://socket.io/docs/v4/ which was great for understanding how the communication process worked. It also taught me how to set up a server and client, manage events, and handle acknowledgments. For debugging, I used tools that I had previously learnt in other classes, such as the Chrome browser developer tools console and the VSCode Debugger with breakpoints and logpoints, which allowed me to diagnose CORS issues and verify if events were being emitted and if the emitted events were being received through the logs/erros diplayed.

November 16, 2024

Shannon’s Status Report 11/16/24

This week, I worked on WebSockets with Jeffrey during our Tuesday and Thursday meet-up. Initially, I spent some time helping Jeffrey set up his virtual environment, ensuring that he had access to our GitHub repository, and that ultimately he was able to run our WebApp on his computer so that he could test the DSI display showing correct information based on the WebApp inputs. Jeffrey later ran into some git commit issues that I also worked with him to resolve (he had accidentally committed the virtual environment folder to our GitHub repository, resulting in more than 1 million lines of code being committed and causing him to be unable to git pull due to the sheer volume of content). Unfortunately, as of right now, we are still running into some issues trying to use socket.IO to have the WebApp communicate with the RPi and display. Previously, we were able to communicate with just the RPi itself, however, when trying to draw up the DSI display using Tkinter, Jeffrey ran into issues with trying to communicate between the WebApp and the RPi. While WebApp to RPi communication could work, it did not work the other way around and he is still working to resolve this issue. As such, although I was hoping to be able to test the WebApp display based on the RPi-sent information, since this communication from the RPi to the WebApp is still buggy, I was unable to do so. Hopefully Jeffrey is able to resolve this issue in the next week, and I will be able to more thoroughly test the code I have written.

I have also worked on improving the latency of the TTS feature. Previously, there was an issue where upon a large file upload, the TTS would take a long time to process the information before speaking. As such, I have changed the TTS interface to include an option for the user to choose how many words they want in a “part”. If a user inputs 50 words, when they click “Start Reading!”, the input .txt file is processed, the text is split into 50 word parts, and the first 50-word part will be read. After reading, the website will display “Part 1 read successfully” and a new button will appear, saying “Continue?”. If the user clicks on it, the next 50-word part will be read. Once all parts have been read, a message reading “Finished reading input text, upload new .txt file to read new input.” will appear, and the “Continue?” button will disappear.

User can upload a .txt file, default word count is set to 50 (latency of ~2s,)

Reading a 50-word text with the maximum word limit set to 25 words (should read in two parts).

After the first part/25-word chunk is read successfully, continue button appears:

After second part is read successfully, continue button disappears and the finished message appears.

Lastly, this week I also worked on a slight UI improvement for the website. When a user ends a study session, there is now star confetti! (Similar to Canvas submissions).

According to the Gantt chart, I am on target and have finished all individual components that I am responsible for. All other tasks that I am involved in are collaborative with either Jeffrey (WebSockets – RPi receiving and sending info) or Mahlet (TTS on Robot) being in charge. Although slightly behind on what I initially planned for the interim demo (Study Session not fully working), everything else I had planned to talk about is working.

In the next week, I’ll be working on:

Helping Jeffrey finish up the Study Session feature
Helping Jeffrey to start RPS Game feature
Implementing TTS feature on the Robot with Mahlet

November 9, 2024November 11, 2024

Shannon’s Status Report 11/9/2024

This week, I worked on ensuring that Study Session information could be sent via WebSockets, and I managed to succeed in doing so. The WebApp can successfully send over information when the user creates a Study Session, and it can send over information that the user has ended a Study Session. As for robot to WebApp communication, because the pause button on the robot for Study Session has not yet been tested and implemented, I have not yet tested if the code I wrote for upon receiving such an input through WebSockets works yet. Theoretically, upon the pause button being pressed, the RPi should send a message via WebSockets through something like socket.emit(“Session paused”), and upon receiving such a message, the WebApp display page will show “Study Session on Break” instead of “Study Session in progress”. Ideally, I wish to test this with the actual pause button being pressed on the robot, but if Jeffrey runs into some issues with implementing that in time, I will test it by just sending the message 10 seconds after the RPi receives the start session information by default to see if the code I have written actually works. In conclusion, WebApp to robot communication is working (Figure 1), robot to WebApp communication needs testing on the WebApp end.

Figure 1: RPi receiving Study Session information from the WebApp.

I also worked on the WebSocket code for the RPS Game this week. Unlike the Study Sessions, there is significantly less communication between the WebApp and the robot and as such, I only worked on this after I was confident about WebSockets working for our Study Session feature. For the RPS Game, all the WebApp has to do is send over information at the start of the game with regards to the number of rounds they wish to play, and then all gameplay occurs on the robot with RPi, DSI display and the RPS game buttons. When the game ends, the robot then sends back game statistics via WebSockets, which gets displayed on the WebApp. I am able to send the RPS Game information to the robot with no issue, but I have yet to test the receiving of information and the display that should occur, which I will focus on next week. Same as before, WebApp to robot communication is working, robot to WebApp communication needs testing.

For TTS feature, I didn’t have as much time to work on it this week, but I have managed to implement the reading of a .txt file instead of just inputting text into a text field! A user can now upload .txt file and gTTS is able to read the text of the .txt file.

Lastly, this week I also worked on the overall UI of our page to make sure that everything looks neater and more visually appealing. Previously, the links were all connected together and messy, but I have separated the header bar into individual sections with buttons and so the overall UI looks more professional. I will continue to work on improving the overall style of our website, but now it more closely resembles the mock-ups that I drew up in the design report.

According to the Gantt chart, I am on target.

In the next week, I’ll be working on:

Finishing up the Study Session feature
Finishing up TTS feature on the WebApp
Testing RPS Game statistics display on the WebApp
All Interim demo goals are listed under Team Report!

November 9, 2024November 9, 2024

Team Status Report for 11/09/2024

Currently the biggest risk is to the overall system integration. Shannon has the WebApp functional, and Jeffrey has been working on unit testing individual parts of code such as RPS/DSI display. We will have to work on ensuring that the overall process is smooth, starting from ensuring the inputs from GPIO pins on the robot can be processed by RPi and then that the relevant information is sent to the Web App accordingly through WebSockets (so we can record information such as rock paper scissors game win/loss/tie results) and then that the WebApp displays the correct information based on what it received through WebSockets.

We will also need to perform some latency testing to ensure that this process is happening with little delay. (e.g. pausing from the robot is reflected promptly on the WebApp – WebApp page should switch from Study Session in progress to Study Session on break page almost instantly).

Due to the display screen to RPi ribbon connector’s length and fragility, we have decided to limit the neck rotation to a range of 180 degrees. In addition, translational motion is also limited because of this. Therefore, by the interim demo, we only intend to have the rotational motion, and depending on the flexibility of the ribbon connector, we will limit or get rid of the translational motion.

Interim demo goals:

Mahlet:

I will have a working audio localization with or close to the 5 degree margin of error in simulation.
I plan to have the correct audio input signals in each microphone, and integrate this input with the audio processing pipeline in the RPi.
I will integrate the servo motor with the neck motion, and make sure the robot’s neck motion is working as desired.
I will work with Shannon to ensure TTS functionality through gTTS and will do testing on pyttsx3 directly from RPi.

Shannon:

I aim to have the Study Session feature fully fleshed out for a standard Study Session, such that a user can

Start a Study Session on the WebApp (WebApp sends information to robot which starts timer)
Pause it on the robot (and it reflects on the WebApp)
When goal duration has been reached, the robot alerts WebApp and WebApp displays appropriate confirmation alert
User can choose to end the Study Session or continue on the WebApp (WebApp should send appropriate information to RPi)
1. RPi upon receiving information should either continue timer (Study Session continue) or display happy face (revert to default display)*
At any point during the Study Session, user should also be able to end the Study Session (WebApp should send information to RPi)
1. RPi upon receiving information should stop timer and then display happy face (revert to default display)*

* – indicates parts that Jeffrey is in charge of but I will help with

I also plan to have either the pyttsx3 library working properly such that the text-to-speech feature works on the WebApp, or have the gTTS feature working with minimal (<5s) processing time by pre-processing the user input into chunks and then generating mp3 files for each chunk in parallel while playing them sequentially.

For the RPS Game feature, I aim to ensure that the RPi can receive starting game details from the WebApp and that the WebApp can receive end game statistics to display appropriately.

Jeffrey:

The timer code is able to tick up properly, but I have to ensure that pausing the timer (user can pause timer using keypad) is synced with WebApp. Furthermore, the time that the user inputs is stored in the Web App in a dictionary. I currently have code that is able to extract the study time from the duration (key in dictionary), and passes that into the study timer function, so the robot can display the time counting up on the DSI display. One mitigation is that we have the pause functionality on the DSI display itself, as opposed to GPIO input -> RPi5 -> WebApp. By using the touchscreen, we decrease reliance on hardware and makes it easier to debug via Tkinter and software.

RPS code logic is functional, but needs to be able to follow the flow chart from design report to go from confirm “user is about to play a game” screen -> display rock/paper/scissors (using Tkinter) -> display Win/Loss/Tie screen, or reset if no input confirmed. Our goal is to use the keypad (up/down/left/right arrows) connected to RPi5 to take in user input, and output the result accordingly. One mitigation goal is that we can utilize the touchscreen display of the DSI display, to directly take in user input on the screen to send to the WebApp.

Integration goals:

The TTS will be integrated with the speaker system. Mahlet and Shannon are working on the TTS and Jeffrey will be working on outputting the TTS audio through the speaker.
For the Web App, Jeffrey needs to be able to take in user input from the Web App (stored as json), parse it, and send inputs to functions such as timer counting up, or the reverse, where a user action is sent to the WebApp i.e. user chose rock, and won that round of RPS.

There have not been any changes to our schedule.

November 2, 2024

Shannon’s Status Report for 11/2/24

This week, I focused on the TTS feature of our robot. I spent some time trying to use gTTS (Google Text to Speech) on our WebApp, and it worked! We were able to take in input text on the WebApp and have it read out by the user’s computer. However, there is a significant issue that gTTS has, which is the latency of the feature. gTTS library works by converting all text to an mp3 file, and then the WebApp plays the mp3 file for the user. The problem thus arises when a long piece of text is used as input. The specific details are also mentioned in the Team Weekly Report, but essentially the delay can be as long as 30s for a piece of long text to be read. As such, this is definitely a concern for our project. Our previous library, pyttsx3, translates as it processes text and as such there was no scaling latency associated with it, unlike gTTS. Me and Mahlet have agreed that we will still try and get pyttsx3 to work to avoid this significant latency issue from gTTS, and if we still can’t get it to work by the end of next week, we will switch to using gTTS and possibly split up the input text into 150-200 word chunks and then have multiple mp3 files be generated and then played back-to-back.

I also worked on the WebSocket code for the Study Sessions this week. Following our success in having the RPi communicate and respond with our WebApp, this week I have written some code on our WebApp Study Session feature to have it send over information about the Study Session when it is created to see if the RPi can receive the Study Session information and not just information about whether a button was clicked. Unfortunately, I have not had a chance to test this out yet on the RPi, but I have confidence that it will work. I have also written some code in preparation to be added to the RPi to see if the WebApp can receive information about paused Study Sessions that I plan on transferring to the RPi when I am next available to work on it. Ideally, by the end of next week, all communications between the RPi and the WebApp will be working enough to simulate a study session occurring.

In the next week, I’ll be working on:

Researching a solution for pyttsx3 with Mahlet
Study Sessions communications between the WebApp and RPi through WebSockets
Starting Study Session timing code on the RPi

November 2, 2024November 2, 2024

Team Status Report for 11/02/2024

The most significant risk is the TTS functionality. The hi library has been causing issues when data text input is directed from the WebApp directly. We have issues where a “run loop has already started” error occurs when trying to read new text after the first submission. As such, we have looked into alternatives, which includes Google TTS (gTTS). Upon trying gTTS out, we realized that while it does work successfully, it takes a significant amount of time to read long pieces of text aloud. Short texts with <20 words take an insignificant amount of time (2-3s) but longer pieces of text we tried such as 1000 words can take up to approximately 30s, and when we tried a standard page of text from a textbook, it took roughly 20s. These time delays are quite significant and is due to the fact that gTTS converts all text to mp3 first, and then the mp3 is played on the WebApp, whereas the previous TTS engine we wanted to use, pyttsx3, converts the text to speech as it reads the input text, and so performs much better. We also tried installing another TTS library, (just called the TTS python library) as a potential alternative for our purpose. We found that the file is very big, and when we tried installing it to our local computer it took hours and still wasn’t complete. We are concerned about the size of the library as we have limited space on the RPi. This library supports 1100 languages, and it takes very long to install. We plan to keep this in mind as a potential alternative, but as of now, gTTS library is the better option.

One potential risk with the DCI display to RPi5 connection is the fact that we aren’t able to connect via the HDMI port. Our goal is to use the MIPI DSI port. From the Amazon website, there is an example video of connecting the DCI display directly to the RPi5, to ensure that the display is driver free and compatible with the RPi signal (The RPi OS should automatically detect the display resolution). The display is 800×480 pixels, if our port isn’t working, we can directly set the resolution to the screen via code: hdmi_cvt = 800 480 60 6. This represents the horizontal resolution and vertical resolution in pixels, the refresh rate in hertz, as well as the aspect ratio, respectively.

As an update to our previous report concerns about not having built the robot base, this week, we have managed to laser-cut the robot base out of wood. Since the base is designed to be assembled and disassembled easily, it allows for easy parts access/modification to the circuit. For photos and more information about this, refer to Mahlet’s Status Report.

There are no changes to our schedule.

October 26, 2024October 27, 2024

Mahlet’s Status Report for 10/26/2024

This week I worked on the forward audio triangulation method with the real life scale in mind. I limited the bounds of the audio source to 5 feet from each side of the robot’s base and placed the microphones at a closer distance. I accounted for accurate values in units to make my approximation possible. Using this, and knowing the sound source location, I was able to pinpoint the source of the audio cue. I used a smaller scale to go over the grid dimensions to have a closer approximation. This is to allow low inaccuracies in the direction that the robot is going to turn to.

I randomly generate the audio source location, and below are some of the simulations for this. The red circles denote the source of audio and the cross indicates the audio source.

After this, I pivoted from audio triangulation and focused on tasks such as setting up the RaspberryPi, doing tests for the TTS with Shannon and learned about the WebSocket connection methodology. I joined Shannon and Jeffrey’s session when they discussed the WebSocket’s approach and learned about it

During setting up the RaspberryPi, I ran into some issues with it, while trying to SSH to it. Setting up folders and the basics however went well. One task for next week is to reach out to the department to get more information about prior connections with the raspberry pi. It is already connected to CMU Secure as well as CMU devices networks, however it doesn’t seem to be working with the CMU device network. I tried registering the device to CMU Devices, but it seems like it has been registered prior to this semester. I aim to figure out the issue with SSH-ing to this device over the next week. However, we can still work with the RPi using a monitor, so this is not a big issue.

After this, I worked on Text-To-Speech along with Shannon, and worked on the pyttsx3 library. We intended so that the WebApp reads various texts back to back through the text/file input mechanism. The library works by initializing a text engine, and uses the function, engine.say(), to read the text input. This works when running the app for the first time. However after inputting data for the second time and onwards, it gets stuck in a loop. The built-in engine.stop() function requires multiple instances of initialization of the text engine, which causes the WebApp to lag. As a result, Shannon and I have decided to look into more TTS libraries that can be used for python, and also we will try testing the TTS directly on the RPi instead of the WebApp first.

My progress is on track, the only setback is the late arrival of ordered parts. As described in the team weekly report, I will be using the slack time to accommodate for progress with assembling the robot, and integrating systems.

Next week I will be working on finalizing the audio triangulation, work with Shannon to find the optimal TTS functionality and work with Jeffrey to build the hardware.