Audio – Team C2: StudyBuddyBot

December 8, 2024December 8, 2024

Mahlet’s Status Report for 12/07/2024

This week, mainly consisted of debugging my audio localization solution and making necessary changes to the hardware of SBB.

Hardware

Based on the decision to change motors from servo to stepper, I had to change the mounting mechanism of the robot’s head to the body. I was able to reuse most of the components from the previous version, and had to make the mounting stand slightly longer to be in line with our use case requirement. Now the robot can move its head very smoothly and consistently.

My work on audio localization and its integration with the neck rotation mechanism has made significant progress, though some persistent challenges remain. Below is a detailed breakdown of my findings and ongoing efforts.

To evaluate the performance of the audio localization algorithm, I conducted simulations using a range of true source angles from 0° to 180°. The algorithm produced estimated angles that closely align with expectations, achieving a mean absolute error (MAE) of 2.00°. This MAE was calculated by comparing the true angles with the estimated angles and provides a clear measure of the algorithm’s accuracy. The result confirms that the algorithm performs well within the intended target of a ±5° margin of error.

To measure computational efficiency, I used Python’s time library to record the start and end times for the algorithm’s execution. Based on these measurements, the average computation time for a single audio cue is 0.0137 seconds. This speed demonstrates the algorithm’s capability to meet real-time processing requirements.

In integrating audio localization with the neck rotation mechanism, I observed both promising results and challenges that need to be addressed.

For audio cue detection, I tested the microphones to identify claps as valid signals. These signals were successfully detected when they exceeded an Arduino ADC threshold of 600. Upon detection, these cues are transmitted to the Raspberry Pi (RPi) for angle computation. However, the integration process revealed inconsistencies in serial communication between the RPi and the Arduino.

While the typical serial communication latency is 0.2 seconds or less, occasional delays ranging from 20 to 35 seconds have been observed. These delays disrupt the system’s responsiveness and make it challenging to collect reliable data. The root cause could be the Arduino’s continuous serial write operation, which conflicts with its role in receiving data from the RPi. The data received on the RPi seems to be handled okay, but I will proceed to validate the data side-by-side, and make sure the values are accurate. Attempts to visualize the data on the computer side were too slow for the sampling rate of 44kHz, leaving gaps in real-time analysis.

To address hardware limitations, I have temporarily transitioned testing to a laptop due to USB port issues with the RPi. However, this workaround has not resolved the latency issue entirely.

Despite these challenges, the stepper motor has performed within expectations. The motor’s rotation from 0° to 180° was measured at 0.95 seconds, which meets the target of under 3 seconds, assuming typical latency.

Progress is slightly behind schedule, and the contingency plan for this is indicated in the google sheets of the team weekly report.

Next Steps

Resolving the serial communication latency is my highest priority. I will focus on optimizing the serial read and write operations on both the Arduino and RPi to prevent delays. Addressing the RPi’s USB port malfunction is another critical task, as it will enable me to move testing back to the intended hardware. Otherwise, I will resort to the contingency plan of using the webapp to compute the data. I will be finalizing all the tests I need for the report, and finalize integration with my team over the final week.

November 30, 2024November 30, 2024

Mahlet’s Status Report for 11/30/2024

As we approach the final presentation of our project, my main focus has been preparing for the presentation, as I will be presenting the coming week.

In addition to this, I have assembled the robot’s body, and made necessary modifications to the body to make sure every component is placed correctly. Below are a few pictures of the changes so far.

I have modified the robot’s face so that it can encase the display screen. Previously, the head was a solid box. The servo to head mount is now properly assembled. The head is well balanced using the stand I used to mount the motor to. This way there is space to place the Arduino, speaker and RaspberryPi accordingly. I have also mounted the microphones to the corners as desired.

Before picture:

After picture:

Mounted microphones on to the robot’s body

Assembled Body of the robot

Assembled body of the robot including the display screen

I have been able to detect a clap cue using the microphone, by identifying the threshold of a loud enough clap detectable by the microphone. I do this processing in the raspberry pi, and once the RPi detects the clap, it runs the signal through the direction estimate function, which spits out the angle. This angle is then sent to the Arduino to modify the motor to turn the robot’s head. Due to the late arrival of our motor parts, I haven’t been able to test the integration of the motor with the audio input. This put me a little behind, but using the slack time we allocated, I plan to finalize this portion of the project within the coming week.

Another thing I worked on is implementing the software aspect of the RPS game, and once the keypad inputs are appropriately detected, I will meet with Jeffrey to integrate these two functionalities.

I briefly worked with Shannon to make sure the audio output for the TTS through the speaker attached to the RPi works properly.

Next week:

Finalize the integration and testing of audio detection + motor rotation
Finalize the RPS game with keypad inputs by meeting with the team.
Finalize the overall integration of our system with the team.

Some new things I learned during this capstone project is how to use serial communication between Arduino and a raspberry pi. I used some online Arduino resources that clearly teach how to do this. I also learned how to perform signal analysis on audio inputs to localize the source of a sound within a range. I learned how to use the concept of time difference of arrival to get my system working. I used some online resources about signal processing, and by discussed with my professors to clarify any misunderstandings I had towards my approach. I also learned from online resources, Shannon and Jeffrey how a WebSocket works. Even though my focus was not really on the web app to RPi communication, it was good learning how their systems work.

November 16, 2024November 16, 2024

Mahlet’s Status Report for 11/16/2024

This week, I was able to successfully finalize the audio localization mechanism.

Using matlab, I have been able to successfully pinpoint the source of an audio cue with an error margin of 5 degrees. This is also successful for our intended range of 0.9 meters, or 3 feet. This is tested using generated audio signals in simulation. The next step for the audio localization is to integrate it with the microphone inputs. I take in an audio input signal and pass it in through a bandpass to isolate the audio cue we are responding to. The microphone then keeps track of the audio signals, in each microphone for the past 1.5 seconds, and uses the estimation mechanism to pinpoint the audio source.

In addition to this, I have 3D printed the mount design that connects the servo motor to the head of the robot. This will allow for a seamless rotation of the robot head, based on the input detected.

Another key accomplishment this week is the servo motor testing. I ran into some problems with our RPi’s compatibility with the recommended libraries. I have tested the servo on a few angles, and have been able to get some movement, but the calculations based on the PWM are slightly inaccurate.

The main steps for servo and audio neck accuracy verification is as follows.

Verification

The audio localization testing on simulation has been conducted by generating signals in matlab. The function was able to accurately identify the audio cue’s direction. The next testing will be conducted on the microphone inputs. This testing will go as follows:

In a quiet setting, clap twice within a 3 feet radius from the center of the robot.
Take in the clap audio and isolate ambient noise through the bandpass filter. Measure this on a waveform viewer to verify the accuracy of the bandpass filter.
Once the clap audio is isolated, make sure correct signals are being passed into each microphone using a waveform viewer.
Get the time it takes for this waveform to be correctly recorded, and save the signal to estimate direction.
Use the estimate direction function to identify the angle of the input.

To test the servo motors, varying angle values in the range of 0 and 180 will be applied. Due to the recent constraint of neck motion of the robot, if the audio cue’s angle is in the range of 180 and 270, the robot will turn to 180. If the angle is in the range of 270 and 360, the robot will turn to 0.

To verify the servo’s position accuracy, we will use an oscilloscope to verify the servo’s PWM, and ensure proportional change of position relative to time.
This will also be verified using visual indicators, to ensure reasonable accuracy.

Once the servo position has been verified, the final step would be to connect the output of the estimate_direction to the servo’s input_angle function.

My goal for next week is to:

Accurately calculate the servo position
Perform testing on the microphones per the verification methods mentioned above
Translate the matlab code to python for the audio localization
Begin final SBB body integrating

November 10, 2024November 10, 2024

Mahlet’s Status Report 11/09/2024

This week, I worked on Audio localization mechanism, servo initialization through the RPi and ways of mounting the servo to the robot head for seamless rotation of the head.

Audio localization:

I have a script that records audio for a specified duration, in our case would be every 1.5 seconds, and this will take in an input audio and filter out the clap sound from the surrounding using a bandpass filter. This audio input from each mic is then passed into the function that performs the direction estimation by performing cross correlation between each microphone.

I have finalized the mathematical approach using the four microphones. After calculating the time difference of arrival between each microphone, I have been able to get close to the actual input arrival differences with slight variations. These are causing very unstable direction estimation to a margin of error to up to 30 degrees. The coming week, I will be working on cleaning up this error to ensure a smaller margin of error, and a more stable output.

I also did some testing by using only three of the microphones in the orientation (0,0), (0, x), (y, 0) as an alternative approach. x and y are the dimensions of the robot(x = 8 cm, y = 7cm). This yields slightly more inaccurate results. I will be working on fine-tuning the 4 microphones, and as needed, I will modify the microphone positions to get the most optimal audio localization result.

Servo and the RPi:

The Raspberry pi has a built-in library called python3-rpi.gpio, which initializes all the GPIO pins on the raspberry pi. The servo motor connects to the power, ground and a GPIO pin which receives the signal. The signal wire connects to a PWM GPIO pin, to allow for precise control over the signal that is sent to the servo. This pin can be plugged into GPIO12 or GPIO13.

After this, I specify that the pin is an output and then initialize the pin. I use the set_servo_pulsewidth function to set the pulse width of the servo based on the angle from the audio localization output.

Robot Neck to servo mounting solution:

I designed a bar to mount the robot’s head to the servo motor while it’s housed in the robot’s body.

The CAD for this design is as follows.

By next week, I plan to debug the audio triangulation and minimize the margin of error. I will also 3D print the mount and integrate it with the robot, and begin integration testing of these systems.

November 9, 2024November 9, 2024

Team Status Report for 11/09/2024

Currently the biggest risk is to the overall system integration. Shannon has the WebApp functional, and Jeffrey has been working on unit testing individual parts of code such as RPS/DSI display. We will have to work on ensuring that the overall process is smooth, starting from ensuring the inputs from GPIO pins on the robot can be processed by RPi and then that the relevant information is sent to the Web App accordingly through WebSockets (so we can record information such as rock paper scissors game win/loss/tie results) and then that the WebApp displays the correct information based on what it received through WebSockets.

We will also need to perform some latency testing to ensure that this process is happening with little delay. (e.g. pausing from the robot is reflected promptly on the WebApp – WebApp page should switch from Study Session in progress to Study Session on break page almost instantly).

Due to the display screen to RPi ribbon connector’s length and fragility, we have decided to limit the neck rotation to a range of 180 degrees. In addition, translational motion is also limited because of this. Therefore, by the interim demo, we only intend to have the rotational motion, and depending on the flexibility of the ribbon connector, we will limit or get rid of the translational motion.

Interim demo goals:

Mahlet:

I will have a working audio localization with or close to the 5 degree margin of error in simulation.
I plan to have the correct audio input signals in each microphone, and integrate this input with the audio processing pipeline in the RPi.
I will integrate the servo motor with the neck motion, and make sure the robot’s neck motion is working as desired.
I will work with Shannon to ensure TTS functionality through gTTS and will do testing on pyttsx3 directly from RPi.

Shannon:

I aim to have the Study Session feature fully fleshed out for a standard Study Session, such that a user can

Start a Study Session on the WebApp (WebApp sends information to robot which starts timer)
Pause it on the robot (and it reflects on the WebApp)
When goal duration has been reached, the robot alerts WebApp and WebApp displays appropriate confirmation alert
User can choose to end the Study Session or continue on the WebApp (WebApp should send appropriate information to RPi)
1. RPi upon receiving information should either continue timer (Study Session continue) or display happy face (revert to default display)*
At any point during the Study Session, user should also be able to end the Study Session (WebApp should send information to RPi)
1. RPi upon receiving information should stop timer and then display happy face (revert to default display)*

* – indicates parts that Jeffrey is in charge of but I will help with

I also plan to have either the pyttsx3 library working properly such that the text-to-speech feature works on the WebApp, or have the gTTS feature working with minimal (<5s) processing time by pre-processing the user input into chunks and then generating mp3 files for each chunk in parallel while playing them sequentially.

For the RPS Game feature, I aim to ensure that the RPi can receive starting game details from the WebApp and that the WebApp can receive end game statistics to display appropriately.

Jeffrey:

The timer code is able to tick up properly, but I have to ensure that pausing the timer (user can pause timer using keypad) is synced with WebApp. Furthermore, the time that the user inputs is stored in the Web App in a dictionary. I currently have code that is able to extract the study time from the duration (key in dictionary), and passes that into the study timer function, so the robot can display the time counting up on the DSI display. One mitigation is that we have the pause functionality on the DSI display itself, as opposed to GPIO input -> RPi5 -> WebApp. By using the touchscreen, we decrease reliance on hardware and makes it easier to debug via Tkinter and software.

RPS code logic is functional, but needs to be able to follow the flow chart from design report to go from confirm “user is about to play a game” screen -> display rock/paper/scissors (using Tkinter) -> display Win/Loss/Tie screen, or reset if no input confirmed. Our goal is to use the keypad (up/down/left/right arrows) connected to RPi5 to take in user input, and output the result accordingly. One mitigation goal is that we can utilize the touchscreen display of the DSI display, to directly take in user input on the screen to send to the WebApp.

Integration goals:

The TTS will be integrated with the speaker system. Mahlet and Shannon are working on the TTS and Jeffrey will be working on outputting the TTS audio through the speaker.
For the Web App, Jeffrey needs to be able to take in user input from the Web App (stored as json), parse it, and send inputs to functions such as timer counting up, or the reverse, where a user action is sent to the WebApp i.e. user chose rock, and won that round of RPS.

There have not been any changes to our schedule.

November 2, 2024November 3, 2024

Mahlet’s Status Report for 11/02/2024

This week, I worked on the robot base structure building. Based on the CAD drawing we did earlier in the semester, I generated parts for the robot base and head that have finger edge joints. This allows for easy assembly. This way we can disassemble the box to modify the parts on the inside, and easily reassemble it back. The box looks as follows:

During this process, I used the 1/8th inch hardwood boards we purchased and cut out every part of the body. The head and the body are separate, as they will be connected with a rod to allow for easy rotation and translational motion. This rod will be mounted to the servo motor. As a reminder, the CAD drawing looks as follows.

I laser cut the boxes and assembled each part separately. Inside of the box, we will be placing the motors, RPi, and speakers. The wiring of the buttons will also be placed in the body of the robot. The results are as follows. The “feet” of the robot will be key inputs, which haven’t been delivered yet. The result so far look as follows:

In addition to these, I worked on the TTS functionality with Shannon. I did some tests and found that the Pyttsx3 library works when running text input iterations outside of the webapp. The functionality we are testing is integrating the text input directly into the text to speech engine. This kept causing the loop error. When I tested the pyttsx3 in a separate file where I pass in various texts back to back by only initializing the engine once, it works as expected.

We also worked on the gTTS library. The way this works is, it generates an MP3 file for the text file input and then reads that out once it’s done. This file generation causes a very high latency. For a thousand words, it takes over 30 seconds to generate the file. Despite this, we came up with plans to break up the file into multiple chunks and create the MP3 files in parallel, lowering the latency. This would get us to a faster TTS time, without having any issues similar to the pyttsx3 library. This is a better and fully functional alternative from our options, with a reasonable tradeoff of having slightly longer latency for longer texts for a reliable TTS machine.

In the coming week, I will be working mainly on finalizing the audio triangulation along with some testing, and begin integrating systems the servo system with the audio response with Jeffrey.

October 26, 2024October 27, 2024

Mahlet’s Status Report for 10/26/2024

This week I worked on the forward audio triangulation method with the real life scale in mind. I limited the bounds of the audio source to 5 feet from each side of the robot’s base and placed the microphones at a closer distance. I accounted for accurate values in units to make my approximation possible. Using this, and knowing the sound source location, I was able to pinpoint the source of the audio cue. I used a smaller scale to go over the grid dimensions to have a closer approximation. This is to allow low inaccuracies in the direction that the robot is going to turn to.

I randomly generate the audio source location, and below are some of the simulations for this. The red circles denote the source of audio and the cross indicates the audio source.

After this, I pivoted from audio triangulation and focused on tasks such as setting up the RaspberryPi, doing tests for the TTS with Shannon and learned about the WebSocket connection methodology. I joined Shannon and Jeffrey’s session when they discussed the WebSocket’s approach and learned about it

During setting up the RaspberryPi, I ran into some issues with it, while trying to SSH to it. Setting up folders and the basics however went well. One task for next week is to reach out to the department to get more information about prior connections with the raspberry pi. It is already connected to CMU Secure as well as CMU devices networks, however it doesn’t seem to be working with the CMU device network. I tried registering the device to CMU Devices, but it seems like it has been registered prior to this semester. I aim to figure out the issue with SSH-ing to this device over the next week. However, we can still work with the RPi using a monitor, so this is not a big issue.

After this, I worked on Text-To-Speech along with Shannon, and worked on the pyttsx3 library. We intended so that the WebApp reads various texts back to back through the text/file input mechanism. The library works by initializing a text engine, and uses the function, engine.say(), to read the text input. This works when running the app for the first time. However after inputting data for the second time and onwards, it gets stuck in a loop. The built-in engine.stop() function requires multiple instances of initialization of the text engine, which causes the WebApp to lag. As a result, Shannon and I have decided to look into more TTS libraries that can be used for python, and also we will try testing the TTS directly on the RPi instead of the WebApp first.

My progress is on track, the only setback is the late arrival of ordered parts. As described in the team weekly report, I will be using the slack time to accommodate for progress with assembling the robot, and integrating systems.

Next week I will be working on finalizing the audio triangulation, work with Shannon to find the optimal TTS functionality and work with Jeffrey to build the hardware.