Mahlet’s Status Report for 11/16/2024

This week, I was able to successfully finalize the audio localization mechanism. 

Using matlab, I have been able to successfully pinpoint the source of an audio cue with an error margin of 5 degrees. This is also successful for our intended range of 0.9 meters,  or 3 feet. This is tested using generated audio signals in simulation. The next step for the audio localization is to integrate it with the microphone inputs. I take in an audio input signal and pass it in through a bandpass to isolate the audio cue we are responding to. The microphone then keeps track of the audio signals, in each microphone for the past 1.5 seconds, and uses the estimation mechanism to pinpoint the audio source. 

In addition to this, I have 3D printed the mount design that connects the servo motor to the head of the robot. This will allow for a seamless rotation of the robot head, based on the input detected. 

Another key accomplishment this week is the servo motor testing. I ran into some problems with our RPi’s compatibility with the recommended libraries. I have tested the servo on a few angles, and have been able to get some movement, but the calculations based on the PWM are slightly inaccurate.

The main steps for servo and audio neck accuracy verification is as follows. 

Verification 

The audio localization testing on simulation has been conducted by generating signals in matlab. The function was able to accurately identify the audio cue’s direction. The next testing will be conducted on the microphone inputs. This testing will go as follows: 

  1. In a quiet setting, clap twice within a 3 feet radius from the center of the robot. 
  2. Take in the clap audio and isolate ambient noise through the bandpass filter. Measure this on a waveform viewer to verify the accuracy of the bandpass filter. 
  3. Once the clap audio is isolated, make sure correct signals are being passed into each microphone using a waveform viewer. 
  4. Get the time it takes for this waveform to be correctly recorded, and save the signal to estimate direction.
  5. Use the estimate direction function to identify the angle of the input. 

To test the servo motors, varying angle values in the range of 0 and 180 will be applied. Due to the recent constraint of neck motion of the robot, if the audio cue’s angle is in the range of 180 and 270, the robot will turn to 180. If the angle is in the range of 270 and 360, the robot will turn to 0. 

  1. To verify the servo’s position accuracy, we will use an oscilloscope to verify the servo’s PWM, and ensure proportional change of position relative to time. 
  2. This will also be verified using visual indicators, to ensure reasonable accuracy. 

Once the servo position has been verified, the final step would be to connect the output of the estimate_direction to the servo’s input_angle function. 

My goal for next week is to:

  1. Accurately calculate the servo position
  2. Perform testing on the microphones per the verification methods mentioned above
  3. Translate the matlab code to python for the audio localization
  4. Begin final SBB body integrating

 

Team’s Status Report for 11/16/2024

Risk:
One risk that our team is currently addressing is the RPi5 acting as a server to the Web App. Our contingency plan is using the DSI display touch screen to ensure that features such as pause/resume/end study session is working, and button presses on the DSI display would be synced with the Web App. Right now, we are testing that with socket.IO, we are able to communicate with the RPi5. In particular, we want the duration to be inputted on the Web App, to initiate a study session, and for the timer display to start ticking on the DSI display. In the future, we plan to integrate this with GPIO pins, so the user can directly input key presses on the robot body, and the RPi will emit an event that has occurred, such as a pause/resume. After the emit, we want the web socket to listen for these events and update the Web App accordingly and sync itself with the DSI display (controlled by RPi5).

Validation:
We will use this process for all four of our features (Study Session, Text-to-Speech, RPS Game, Audio Response). Using Study Session feature as an example, this is what our validation process looks like:

  • Have users go through and use our Study Session feature from start to end. Ideally we would want a good mix of 5-10 people who have used online study timers and people who have not.
    • We will have the different users try out starting a Study Session on the WebApp (setting the name, duration, type) and then pausing/resuming the session on the robot, and then finally ending the session on the WebApp.
    • We will have them try out the different types of study sessions (Standard vs. Pomodoro – comes with built-in break time) and also try ending a study session before the set duration is up vs. ending it after (when the study session has reached goal duration it should send an alert to the user to ask them if they wish to continue and be able to do so)
    • At the end, we will have them fill out a feedback form focusing on
      • Any confusion/unintuitive use of the product (e.g. buttons too small, want certain information to be displayed elsewhere) – General question asked for all 4 features
      • Satisfaction with the WebApp and robot interactions (e.g. latency issues) – Specific question for Study Session, RPS Game, Text-to-Speech
      • If the user felt this feature would help them develop better time management for tasks and foster better studying habits – Specific question for Study Session
      • Overall suggestions for improvement of this feature – General question
  • For the Text-to-Speech feature,
    • Feedback form will also include a question asking:
      • How the feature was useful for audio learners
  • For our audio response feature, it would be similar except
    • During a study session, the robot will reset to the default position of 90 degrees, giving full attention towards the student.
    • During a study session break, the user can get the attention of the robot at a different location, and proceed with any task. The user can play the rock-paper-scissors game, or do any break time activity as desired.
    • Feedback form will also additionally include:
      • How the feature facilitates their study break (was it entertaining or distracting)
  • For the RPS Game feature,
    • Feedback form will also include:
      • How the feature facilitates their study break (was it entertaining, a good stress relief or not really useful/boring)

This will be a Google form with 4 sections to provide feedback on each feature.

Design changes:
There have been no design changes to our project
Schedule changes:
There have been no changes to the schedule.

Jeffrey’s Status Report for 11/09/2024

For this week, I’ve been working on looking into system integration. In particular, between the RPi 5 and the DSI display. Over the course of the week, we have done a lot of work on connecting the RPi5 to the DSI display, and ensuring that we can show the timer ticking up, as well as the break and home screen. Currently, we have the touch screen working, which is our mitigation plan. How that works is that user is able to pause, reset, and continue a study session straight from the DSI display. Our next goal is to be able to have that information being recorded by the Web App. Via sockets, we will be able to pause on the DSI display, or in the future, with the GPIO buttons. So we will be able to take in button inputs and directly pause or resume the study session, and have the Web App process the inputs with low latency.

Below are the pictures of the touch screen display working:

https://docs.google.com/document/d/1qfM2qQyuzhxMKzobmMc1_AZmue12hhRD4ch7_ocTri8/edit?usp=sharing

The three photos from top to bottom show:

  1. DSI display connection via ribbon connector
  2. Break time screen
  3. Timer counting up combined with the option to pause/resume/end study session

Furthermore, the past week, we have also built the robot base using lasercutting, and I have worked on Web App processing. As Shannon has set up the Web App, my goal is to be able to parse data from the JSON, to be able to take in inputs from the Web App, and send it to the RPi5. For instance, we want the user to be able to input a duration for the study session, and we have a function that can parse the duration from the JSON dictionary and send it to the timer function, so the timer can count up for the study session.

 

In the upcoming week, my goal is to work on integrating the GPIO inputs with the RPi5 and Web App. Our goal is for inputs to be sent from GPIO button presses to the RPi5, and then that information can be sent to the Web App via sockets, so we can store the information. For instance, we want to be able to have button presses for rock/paper/scissors options, and for the Web App to record the win/loss/tie accordingly, as well as decreasing the number of rounds.

Mahlet’s Status Report 11/09/2024

This week, I worked on Audio localization mechanism, servo initialization through the RPi and ways of mounting the servo to the robot head for seamless rotation of the head. 

Audio localization: 

I have a script that records audio for a specified duration, in our case would be every 1.5 seconds, and this will take in an input audio and filter out the clap sound from the surrounding using a bandpass filter. This audio input from each mic is then passed into the function that performs the direction estimation by performing cross correlation between each microphone. 

I have finalized the mathematical approach using the four microphones. After calculating the time difference of arrival between each microphone, I have been able to get close to the actual input arrival differences with slight variations. These are causing very unstable direction estimation to a margin of error to up to 30 degrees. The coming week, I will be working on cleaning up this error to ensure a smaller margin of error, and a more stable output. 

I also did some testing by using only three of the microphones in the orientation (0,0), (0, x), (y, 0) as an alternative approach. x and y are the dimensions of the robot(x = 8 cm, y = 7cm). This yields slightly more inaccurate results. I will be working on fine-tuning the 4 microphones, and as needed, I will modify the microphone positions to get the most optimal audio localization result.

Servo and the RPi: 

The Raspberry pi has a built-in library called python3-rpi.gpio, which initializes all the GPIO pins on the raspberry pi. The servo motor connects to the power, ground and a GPIO pin which receives the signal. The signal wire connects to a PWM GPIO pin, to allow for precise control over the signal that is sent to the servo. This pin can be plugged into GPIO12 or GPIO13. 

After this, I specify that the pin is an output and then initialize the pin. I use the set_servo_pulsewidth function to set the pulse width of the servo based on the angle from the audio localization output. 

Robot Neck to servo mounting solution: 

I designed a bar to mount the robot’s head to the servo motor while it’s housed in the robot’s body. 

The CAD for this design is as follows.

By next week, I plan to debug the audio triangulation and minimize the margin of error. I will also 3D print the mount and integrate it with the robot, and begin integration testing of these systems.

 

 

Shannon’s Status Report 11/9/2024

This week, I worked on ensuring that Study Session information could be sent via WebSockets, and I managed to succeed in doing so. The WebApp can successfully send over information when the user creates a Study Session, and it can send over information that the user has ended a Study Session. As for robot to WebApp communication, because the pause button on the robot for Study Session has not yet been tested and implemented, I have not yet tested if the code I wrote for upon receiving such an input through WebSockets works yet. Theoretically, upon the pause button being pressed, the RPi should send a message via WebSockets through something like socket.emit(“Session paused”), and upon receiving such a message, the WebApp display page will show “Study Session on Break” instead of “Study Session in progress”. Ideally, I wish to test this with the actual pause button being pressed on the robot, but if Jeffrey runs into some issues with implementing that in time, I will test it by just sending the message 10 seconds after the RPi receives the start session information by default to see if the code I have written actually works. In conclusion, WebApp to robot communication is working (Figure 1), robot to WebApp communication needs testing on the WebApp end.

Figure 1: RPi receiving Study Session information from the WebApp.

I also worked on the WebSocket code for the RPS Game this week. Unlike the Study Sessions, there is significantly less communication between the WebApp and the robot and as such, I only worked on this after I was confident about WebSockets working for our Study Session feature. For the RPS Game, all the WebApp has to do is send over information at the start of the game with regards to the number of rounds they wish to play, and then all gameplay occurs on the robot with RPi, DSI display and the RPS game buttons. When the game ends, the robot then sends back game statistics via WebSockets, which gets displayed on the WebApp. I am able to send the RPS Game information to the robot with no issue, but I have yet to test the receiving of information and the display that should occur, which I will focus on next week. Same as before, WebApp to robot communication is working, robot to WebApp communication needs testing.

For TTS feature, I didn’t have as much time to work on it this week, but I have managed to implement the reading of a .txt file instead of just inputting text into a text field! A user can now upload .txt file and gTTS is able to read the text of the .txt file.

Lastly, this week I also worked on the overall UI of our page to make sure that everything looks neater and more visually appealing. Previously, the links were all connected together and messy, but I have separated the header bar into individual sections with buttons and so the overall UI looks more professional. I will continue to work on improving the overall style of our website, but now it more closely resembles the mock-ups that I drew up in the design report.

According to the Gantt chart, I am on target.

In the next week, I’ll be working on:

  • Finishing up the Study Session feature
  • Finishing up TTS feature on the WebApp
  • Testing RPS Game statistics display on the WebApp
  • All Interim demo goals are listed under Team Report!

Team Status Report for 11/09/2024

Currently the biggest risk is to the overall system integration. Shannon has the WebApp functional, and Jeffrey has been working on unit testing individual parts of code such as RPS/DSI display. We will have to work on ensuring that the overall process is smooth, starting from ensuring the inputs from GPIO pins on the robot can be processed by RPi and then that the relevant information is sent to the Web App accordingly through WebSockets (so we can record information such as rock paper scissors game win/loss/tie results) and then that the WebApp displays the correct information based on what it received through WebSockets.

We will also need to perform some latency testing to ensure that this process is happening with little delay. (e.g. pausing from the robot is reflected promptly on the WebApp – WebApp page should switch from Study Session in progress to Study Session on break page almost instantly). 

Due to the display screen to RPi ribbon connector’s length and fragility, we have decided to limit the neck rotation to a range of 180 degrees. In addition, translational motion is also limited because of this. Therefore, by the interim demo, we only intend to have the rotational motion, and depending on the flexibility of the ribbon connector, we will limit or get rid of the translational motion. 

Interim demo goals:

Mahlet: 

  1. I will have a working audio localization with or close to the 5 degree margin of error in simulation. 
  2. I plan to have the correct audio input signals in each microphone, and integrate this input with the audio processing pipeline in the RPi.
  3. I will integrate the servo motor with the neck motion, and make sure the robot’s neck motion is working as desired.
  4. I will work with Shannon to ensure TTS functionality through gTTS and will do testing on pyttsx3 directly from RPi. 

Shannon: 

I aim to have the Study Session feature fully fleshed out for a standard Study Session, such that a user can 

  1. Start a Study Session on the WebApp (WebApp sends information to robot which starts timer)
  2. Pause it on the robot (and it reflects on the WebApp)
  3. When goal duration has been reached, the robot alerts WebApp and WebApp displays appropriate confirmation alert 
  4. User can choose to end the Study Session or continue on the WebApp (WebApp should send appropriate information to RPi) 
    1. RPi upon receiving information should either continue timer (Study Session continue) or display happy face (revert to default display)*
  5. At any point during the Study Session, user should also be able to end the Study Session (WebApp should send information to RPi)
    1. RPi upon receiving information should stop timer and then display happy face (revert to default display)*

* – indicates parts that Jeffrey is in charge of but I will help with

I also plan to have either the pyttsx3 library working properly such that the text-to-speech feature works on the WebApp, or have the gTTS feature working with minimal (<5s) processing time by pre-processing the user input into chunks and then generating mp3 files for each chunk in parallel while playing them sequentially.

For the RPS Game feature, I aim to ensure that the RPi can receive starting game details from the WebApp and that the WebApp can receive end game statistics to display appropriately.

Jeffrey: 

The timer code is able to tick up properly, but I have to ensure that pausing the timer (user can pause timer using keypad) is synced with WebApp. Furthermore, the time that the user inputs is stored in the Web App in a dictionary. I currently have code that is able to extract the study time from the duration (key in dictionary), and passes that into the study timer function, so the robot can display the time counting up on the DSI display. One mitigation is that we have the pause functionality on the DSI display itself, as opposed to GPIO input -> RPi5 -> WebApp. By using the touchscreen, we decrease reliance on hardware and makes it easier to debug via Tkinter and software.

 

RPS code logic is functional, but needs to be able to follow the flow chart from design report to go from confirm “user is about to play a game” screen -> display rock/paper/scissors (using Tkinter) -> display Win/Loss/Tie screen, or reset if no input confirmed. Our goal is to use the keypad (up/down/left/right arrows) connected to RPi5 to take in user input, and output the result accordingly. One mitigation goal is that we can utilize the touchscreen display of the DSI display, to directly take in user input on the screen to send to the WebApp. 

Integration goals: 

  1. The TTS will be integrated with the speaker system. Mahlet and Shannon are working on the TTS and Jeffrey will be working on outputting the TTS audio through the speaker. 
  2. For the Web App, Jeffrey needs to be able to take in user input from the Web App (stored as json), parse it, and send inputs to functions such as timer counting up, or the reverse, where a user action is sent to the WebApp i.e. user chose rock, and won that round of RPS. 

 

There have not been any changes to our schedule.

Jeffrey’s Weekly Status Report for 11/02/2024

For this week, I was focused on code for certain functionalities. Since we had just gotten the robot base laser cutted, we haven’t been able to piece together the robot base to test properties such as motor movements. However, I currently have written code that will move the motors left or right based on the microphone readings as such:

Servo Motor Code

With the current code, our goal is to be able to have the one servo motor rotate between -90 and 90 degrees. Once we are able to test with the physical servo motors connected to the DCI display, we can increase the range of mapping a sound display appropriately. Once we have the code for pinpointing the sound based on the microphones, we can be more precise on how we rotate the DCI display. Furthermore, we also need to test physical sound inputs over varying time delays, to see if there are possibly latency issues between when the robot hears the sound to when the DCI display is rotating.

 

DCI Display Code

For the DCI display, we currently have code that will help us manage transitions between the possible display screen states. For instance, we will need to handle transitions from study screen to break screen, or study screen to RPS screen, etc. Currently, we have code that will enable us to work with three screen options: Study screen, break screen, home screen. The next updates I have to make are screens for the RPS game. This would include win, lose, and tie screens, as well as the appropriate robot celebration face to reflect the result of that round i.e. happy face if user wins, neutral if tie, and sad if player loses.

 

In the upcoming week, we plan to focus on integrating together the robot base, to ensure that we can start putting parts in the base, such as speakers for TTS, microphones, and servo motors. I am slightly behind schedule on revising the RPS logic from last week. I still have to edit some parts and focus on integration with the webapp, as the webapp determines how many rounds to play. By completing the logic, we can focus on the HTML display for all the possible screen options. From here, we can start testing to ensure that the latency of the DCI display is sufficient in being able to transition between different screens.

Shannon’s Status Report for 11/2/24

This week, I focused on the TTS feature of our robot. I spent some time trying to use gTTS (Google Text to Speech) on our WebApp, and it worked! We were able to take in input text on the WebApp and have it read out by the user’s computer. However, there is a significant issue that gTTS has, which is the latency of the feature. gTTS library works by converting all text to an mp3 file, and then the WebApp plays the mp3 file for the user. The problem thus arises when a long piece of text is used as input. The specific details are also mentioned in the Team Weekly Report, but essentially the delay can be as long as 30s for a piece of long text to be read. As such, this is definitely a concern for our project. Our previous library, pyttsx3, translates as it processes text and as such there was no scaling latency associated with it, unlike gTTS. Me and Mahlet have agreed that we will still try and get pyttsx3 to work to avoid this significant latency issue from gTTS, and if we still can’t get it to work by the end of next week, we will switch to using gTTS and possibly split up the input text into 150-200 word chunks and then have multiple mp3 files be generated and then played back-to-back.

I also worked on the WebSocket code for the Study Sessions this week. Following our success in having the RPi communicate and respond with our WebApp, this week I have written some code on our WebApp Study Session feature to have it send over information about the Study Session when it is created to see if the RPi can receive the Study Session information and not just information about whether a button was clicked. Unfortunately, I have not had a chance to test this out yet on the RPi, but I have confidence that it will work. I have also written some code in preparation to be added to the RPi to see if the WebApp can receive information about paused Study Sessions that I plan on transferring to the RPi when I am next available to work on it. Ideally, by the end of next week, all communications between the RPi and the WebApp will be working enough to simulate a study session occurring. 


In the next week, I’ll be working on:

  • Researching a solution for pyttsx3 with Mahlet
  • Study Sessions communications between the WebApp and RPi through WebSockets
  • Starting Study Session timing code on the RPi

Mahlet’s Status Report for 11/02/2024

This week, I worked on the robot base structure building. Based on the CAD drawing we did earlier in the semester, I generated parts for the robot base and head that have finger edge joints. This allows for easy assembly. This way we can disassemble the box to modify the parts on the inside, and easily reassemble it back. The box looks as follows: 

During this process, I used the 1/8th inch hardwood boards we purchased and cut out every part of the body. The head and the body are separate, as they will be connected with a rod to allow for easy rotation and translational motion. This rod will be mounted to the servo motor.  As a reminder, the CAD drawing looks as follows.  

I laser cut the boxes and assembled each part separately. Inside of the box, we will be placing the motors, RPi, and speakers. The wiring of the buttons will also be placed in the body of the robot. The  results are as follows. The “feet” of the robot will be key inputs, which haven’t been delivered yet. The result so far look as follows: 

       

 

In addition to these, I worked on the TTS functionality with Shannon. I did some tests and found that the Pyttsx3 library works when running text input iterations outside of the webapp. The functionality we are testing is integrating the text input directly into the text to speech engine. This kept causing the loop error. When I tested the pyttsx3 in a separate file where I pass in various texts back to back by only initializing the engine once, it works as expected. 

We also worked on the gTTS library. The way this works is, it generates an MP3 file for the text file input and then reads that out once it’s done. This file generation causes a very high latency. For a thousand words, it takes over 30 seconds to generate the file. Despite this, we came up with plans to break up the file into multiple chunks and create the MP3 files in parallel, lowering the latency. This would get us to a faster TTS time, without having any issues similar to the pyttsx3 library. This is a better and fully functional alternative from our options, with a reasonable tradeoff of having slightly longer latency for longer texts for a reliable TTS machine.

In the coming week, I will be working mainly on finalizing the audio triangulation along with some testing, and begin integrating systems the servo system with the audio response with Jeffrey.

Team Status Report for 11/02/2024

The most significant risk is the TTS functionality. The hi library has been causing issues when data text input is directed from the WebApp directly. We have issues where a “run loop has already started” error occurs when trying to read new text after the first submission. As such, we have looked into alternatives, which includes Google TTS (gTTS). Upon trying gTTS out, we realized that while it does work successfully, it takes a significant amount of time to read long pieces of text aloud. Short texts with <20 words take an insignificant amount of time (2-3s) but longer pieces of text we tried such as 1000 words can take up to approximately 30s, and when we tried a standard page of text from a textbook, it took roughly 20s. These time delays are quite significant and is due to the fact that gTTS converts all text to mp3 first, and then the mp3 is played on the WebApp, whereas the previous TTS engine we wanted to use, pyttsx3, converts the text to speech as it reads the input text, and so performs much better. We also tried installing another TTS library, (just called the TTS python library) as a potential alternative for our purpose. We found that the file is very big, and when we tried installing it to our local computer it took hours and still wasn’t complete. We are concerned about the size of the library as we have limited space on the RPi. This library supports 1100 languages, and it takes very long to install. We plan to keep this in mind as a potential alternative, but as of now, gTTS library is the better option.

One potential risk with the DCI display to RPi5 connection is the fact that we aren’t able to connect via the HDMI port. Our goal is to use the MIPI DSI port. From the Amazon website, there is an example video of connecting the DCI display directly to the RPi5, to ensure that the display is driver free and compatible with the RPi signal (The RPi OS should automatically detect the display resolution). The display is 800×480 pixels, if our port isn’t working, we can directly set the resolution to the screen via code: hdmi_cvt = 800 480 60 6. This represents the horizontal resolution and vertical resolution in pixels, the refresh rate in hertz, as well as the aspect ratio, respectively. 

As an update to our previous report concerns about not having built the robot base, this week, we have managed to laser-cut the robot base out of wood. Since the base is designed to be assembled and disassembled easily, it allows for easy parts access/modification to the circuit. For photos and more information about this, refer to Mahlet’s Status Report.  

There are no changes to our schedule.