Mahlet’s Status Report for 11/30/2024

As we approach the final presentation of our project, my main focus has been preparing for the presentation, as I will be presenting the coming week. 

In addition to this, I have assembled the robot’s body, and made necessary modifications to the body to make sure every component is placed correctly. Below are a few pictures of the changes so far. 

I have modified the robot’s face so that it can encase the display screen. Previously, the head was a solid box. The servo to head mount is now properly assembled. The head is well balanced using the stand I used to mount the motor to. This way there is space to place the Arduino, speaker and RaspberryPi accordingly. I have also mounted the microphones to the corners as desired. 

Before picture: 

After picture: 

Mounted microphones on to the robot’s body

Assembled Body of the robot

Assembled body of the robot including the display screen

 

I have been able to detect a clap cue using the microphone, by identifying the threshold of a loud enough clap detectable by the microphone. I do this processing in the raspberry pi, and once the RPi detects the clap, it runs the signal through the direction estimate function, which spits out the angle. This angle is then sent to the Arduino to modify the motor to turn the robot’s head. Due to the late arrival of our motor parts, I haven’t been able to test the integration of the motor with the audio input.  This put me a little behind, but using the slack time we allocated, I plan to finalize this portion of the project within the coming week.

Another thing I worked on is implementing the software aspect of the RPS game, and once the keypad inputs are appropriately detected, I will meet with Jeffrey to integrate these two functionalities. 

I briefly worked with Shannon to make sure the audio output for the TTS through the speaker attached to the RPi works properly. 

 

Next week: 

  1. Finalize the integration and testing of audio detection + motor rotation
  2. Finalize the RPS game with keypad inputs by meeting with the team. 
  3. Finalize the overall integration of our system with the team. 

Some new things I learned during this capstone project is how to use serial communication between Arduino and a raspberry pi. I used some online Arduino resources that clearly teach how to do this. I also learned how to perform signal analysis on audio inputs to localize the source of a sound within a range. I learned how to use the concept of time difference of arrival to get my system working. I used some online resources about signal processing, and by discussed with my professors to clarify any misunderstandings I had towards my approach. I also learned from online resources, Shannon and Jeffrey how a WebSocket works. Even though my focus was not really on the web app to RPi communication, it was good learning how their systems work.

Team Status Report for 11/09/2024

Currently the biggest risk is to the overall system integration. Shannon has the WebApp functional, and Jeffrey has been working on unit testing individual parts of code such as RPS/DSI display. We will have to work on ensuring that the overall process is smooth, starting from ensuring the inputs from GPIO pins on the robot can be processed by RPi and then that the relevant information is sent to the Web App accordingly through WebSockets (so we can record information such as rock paper scissors game win/loss/tie results) and then that the WebApp displays the correct information based on what it received through WebSockets.

We will also need to perform some latency testing to ensure that this process is happening with little delay. (e.g. pausing from the robot is reflected promptly on the WebApp – WebApp page should switch from Study Session in progress to Study Session on break page almost instantly). 

Due to the display screen to RPi ribbon connector’s length and fragility, we have decided to limit the neck rotation to a range of 180 degrees. In addition, translational motion is also limited because of this. Therefore, by the interim demo, we only intend to have the rotational motion, and depending on the flexibility of the ribbon connector, we will limit or get rid of the translational motion. 

Interim demo goals:

Mahlet: 

  1. I will have a working audio localization with or close to the 5 degree margin of error in simulation. 
  2. I plan to have the correct audio input signals in each microphone, and integrate this input with the audio processing pipeline in the RPi.
  3. I will integrate the servo motor with the neck motion, and make sure the robot’s neck motion is working as desired.
  4. I will work with Shannon to ensure TTS functionality through gTTS and will do testing on pyttsx3 directly from RPi. 

Shannon: 

I aim to have the Study Session feature fully fleshed out for a standard Study Session, such that a user can 

  1. Start a Study Session on the WebApp (WebApp sends information to robot which starts timer)
  2. Pause it on the robot (and it reflects on the WebApp)
  3. When goal duration has been reached, the robot alerts WebApp and WebApp displays appropriate confirmation alert 
  4. User can choose to end the Study Session or continue on the WebApp (WebApp should send appropriate information to RPi) 
    1. RPi upon receiving information should either continue timer (Study Session continue) or display happy face (revert to default display)*
  5. At any point during the Study Session, user should also be able to end the Study Session (WebApp should send information to RPi)
    1. RPi upon receiving information should stop timer and then display happy face (revert to default display)*

* – indicates parts that Jeffrey is in charge of but I will help with

I also plan to have either the pyttsx3 library working properly such that the text-to-speech feature works on the WebApp, or have the gTTS feature working with minimal (<5s) processing time by pre-processing the user input into chunks and then generating mp3 files for each chunk in parallel while playing them sequentially.

For the RPS Game feature, I aim to ensure that the RPi can receive starting game details from the WebApp and that the WebApp can receive end game statistics to display appropriately.

Jeffrey: 

The timer code is able to tick up properly, but I have to ensure that pausing the timer (user can pause timer using keypad) is synced with WebApp. Furthermore, the time that the user inputs is stored in the Web App in a dictionary. I currently have code that is able to extract the study time from the duration (key in dictionary), and passes that into the study timer function, so the robot can display the time counting up on the DSI display. One mitigation is that we have the pause functionality on the DSI display itself, as opposed to GPIO input -> RPi5 -> WebApp. By using the touchscreen, we decrease reliance on hardware and makes it easier to debug via Tkinter and software.

 

RPS code logic is functional, but needs to be able to follow the flow chart from design report to go from confirm “user is about to play a game” screen -> display rock/paper/scissors (using Tkinter) -> display Win/Loss/Tie screen, or reset if no input confirmed. Our goal is to use the keypad (up/down/left/right arrows) connected to RPi5 to take in user input, and output the result accordingly. One mitigation goal is that we can utilize the touchscreen display of the DSI display, to directly take in user input on the screen to send to the WebApp. 

Integration goals: 

  1. The TTS will be integrated with the speaker system. Mahlet and Shannon are working on the TTS and Jeffrey will be working on outputting the TTS audio through the speaker. 
  2. For the Web App, Jeffrey needs to be able to take in user input from the Web App (stored as json), parse it, and send inputs to functions such as timer counting up, or the reverse, where a user action is sent to the WebApp i.e. user chose rock, and won that round of RPS. 

 

There have not been any changes to our schedule.

Team Status Report for 11/02/2024

The most significant risk is the TTS functionality. The hi library has been causing issues when data text input is directed from the WebApp directly. We have issues where a “run loop has already started” error occurs when trying to read new text after the first submission. As such, we have looked into alternatives, which includes Google TTS (gTTS). Upon trying gTTS out, we realized that while it does work successfully, it takes a significant amount of time to read long pieces of text aloud. Short texts with <20 words take an insignificant amount of time (2-3s) but longer pieces of text we tried such as 1000 words can take up to approximately 30s, and when we tried a standard page of text from a textbook, it took roughly 20s. These time delays are quite significant and is due to the fact that gTTS converts all text to mp3 first, and then the mp3 is played on the WebApp, whereas the previous TTS engine we wanted to use, pyttsx3, converts the text to speech as it reads the input text, and so performs much better. We also tried installing another TTS library, (just called the TTS python library) as a potential alternative for our purpose. We found that the file is very big, and when we tried installing it to our local computer it took hours and still wasn’t complete. We are concerned about the size of the library as we have limited space on the RPi. This library supports 1100 languages, and it takes very long to install. We plan to keep this in mind as a potential alternative, but as of now, gTTS library is the better option.

One potential risk with the DCI display to RPi5 connection is the fact that we aren’t able to connect via the HDMI port. Our goal is to use the MIPI DSI port. From the Amazon website, there is an example video of connecting the DCI display directly to the RPi5, to ensure that the display is driver free and compatible with the RPi signal (The RPi OS should automatically detect the display resolution). The display is 800×480 pixels, if our port isn’t working, we can directly set the resolution to the screen via code: hdmi_cvt = 800 480 60 6. This represents the horizontal resolution and vertical resolution in pixels, the refresh rate in hertz, as well as the aspect ratio, respectively. 

As an update to our previous report concerns about not having built the robot base, this week, we have managed to laser-cut the robot base out of wood. Since the base is designed to be assembled and disassembled easily, it allows for easy parts access/modification to the circuit. For photos and more information about this, refer to Mahlet’s Status Report.  

There are no changes to our schedule.

Mahlet’s Status Report for 10/26/2024

This week I worked on the forward audio triangulation method with the real life scale in mind. I limited the bounds of the audio source to 5 feet from each side of the robot’s base and placed the microphones at a closer distance. I accounted for accurate values in units to make my approximation possible. Using this, and knowing the sound source location, I was able to pinpoint the source of the audio cue. I used a smaller scale to go over the grid dimensions to have a closer approximation. This is to allow low inaccuracies in the direction that the robot is going to turn to. 

I randomly generate the audio source location, and below are some of the simulations for this.  The red circles denote the source of audio and the cross indicates the audio source.

After this,  I pivoted from audio triangulation and focused on tasks such as setting up the RaspberryPi, doing tests for the TTS with Shannon and learned  about the WebSocket connection methodology. I joined Shannon and Jeffrey’s session when they discussed the WebSocket’s approach and learned about it

During setting up the RaspberryPi, I ran into some issues with it, while trying to SSH to it. Setting up folders and the basics however went well. One task for next week is to reach out to the department to get more information about prior connections with the raspberry pi. It is already connected to CMU Secure as well as CMU devices networks, however it doesn’t seem to be working with the CMU device network. I tried registering the device to CMU Devices, but it seems like it has been registered prior to this semester. I aim to figure out the issue with SSH-ing to this device over the next week. However, we can still work with the RPi using a monitor, so this is not a big issue. 

After this, I worked on Text-To-Speech along with Shannon, and worked on the pyttsx3 library. We intended so that the WebApp reads various texts back to back through the text/file input mechanism. The library works by initializing a text engine, and uses the function, engine.say(), to read the text input. This works when running the app for the first time. However after inputting data for the second time and onwards, it gets stuck in a loop. The built-in engine.stop() function requires multiple instances of initialization of the text engine, which causes the WebApp to lag. As a result, Shannon and I have decided to look into more TTS libraries that can be used for python, and also we will try testing the TTS directly on the RPi instead of the WebApp first.

My progress is on track, the only setback is the late arrival of ordered parts. As described in the team weekly report, I will be using the slack time to accommodate for progress with assembling the robot, and integrating systems. 

Next week I will be working on finalizing the audio triangulation, work with Shannon to find the optimal TTS functionality and work with Jeffrey to build the hardware.

Team Status Report 10/12/2024 / 10/19/2024

The most significant risk as of now is that our team is slightly behind schedule and should be working on completing the build of the robot base and individual component testing along with the implementation using the RPi. To manage these risks, we will use some of the slack time delegated to catch up on these tasks and ensure that our project is still overall on track. Following the completion of the design report, we were able to map the trajectory of each individual task. Some minor changes were also made to the design, with the removal of the todo-list feature on the WA because it felt non-essential and was a one-sided feature on the WebApp, and the neck of the robot having only rotational motion response along the x-axis for audio cue, and y-axis (up and down) translation for a win during RPS Game. We decided to change this because we wanted to reduce the range of motion for our servo horn that connects the servo mount bracket to the DCI display. By focusing on the specified movements, our servo motor system will be more streamlined and even more precise in turning towards the direction of the audio cues.

Part A is written by Shannon Yang

The StudyBuddyBot (SBB) is designed to meet the global need for accessible, personalized learning by being a study companion that can help structure/regulate study sessions  and incorporate tools like text-to-speech (TTS) for auditory learners. The accompanying WebApp to the robot ensures that it can be accessed globally by anyone with an internet connection, without requiring users to download or install complex software or paying exorbitant fees. This accessibility factor helps make SBB a universal solution for learners from different socioeconomic backgrounds.

With the rise of online education platforms and global initiatives to support remote learning, tools like the StudyBuddyBot fill a crucial gap by helping students manage their time and enhance focus regardless of geographic location. If something similar to the pandemic were to happen again, our robot would allow students to continue learning and studying from the comfort of their home while mimicking the effect of them studying with friends. 

Additionally, as mental health awareness grows worldwide, the robot’s ability to suggest breaks can help to address the global issue of burnout among students. The use of real-time interaction via WebSockets allows SBB to be responsive and adaptive, ensuring it can cater to students across different time zones and environments without suffering from delays or a lack of interactivity.

Overall, by considering factors like technological accessibility, global learning trends, and the increasing focus on mental health, SBB can address the needs of a broad, diverse audience.

Part B is written by Mahlet Mesfin

Every student has different study habits, and some struggle to stay focused and manage their break times, making it challenging to balance productivity and relaxation. Our product, StudyBuddyBot (SBB), is designed to support students who face difficulties in maintaining effective study habits. With features such as timed study session management, text-to-speech (TTS) for reading aloud, a short and interactive Rock-Paper-Scissors game, and human-like responses to audio cues, SBB will help motivate and engage students. These personalized interactions keep students focused on their tasks, making study sessions more efficient and enjoyable. In addition, SBB uses culturally sensitive dialogue for its greeting features, ensuring that interactions are respectful and inclusive.

Study habits vary across different cultures. For example, some cultures prioritize longer study hours with fewer breaks, while others value more frequent breaks to maintain focus. To accommodate these differences, SBB offers two different session styles. The first is the Pomodoro technique, which allows users to set both study and break intervals, and the second is a “Normal” session, where students can only set their study durations. Throughout the process, SBB promotes positive moral values by offering encouragement and motivation during study sessions. Additionally, the presence of SBB creates a collaborative environment, providing a sense of company without distractions. This promotes a more focused and productive study atmosphere.

Part C was written by Jeffrey Jehng

The SBB was designed to minimize its environmental impact while still being an effective tool for users. We focus on SBB’s impact on humans and the environment, as well as how its design promotes sustainability. 

The design was created to be modular, so a part that wears out can be replaced as opposed to replacing the whole SBB. Key components, such as the DCI display screen and the microcontroller (RPi), were selected for their low power consumption and long life span, to reduce the need for replacement parts. To be even more energy efficient, we will implement conditional sleep states to the SBB to ensure that power is used only when needed. 

Finally, we have an emphasis on using recyclable materials, such as acrylic for the base, and eco-friendly plastics for the buttons, that reduce the carbon footprint of the SBB. By considering modularity, energy efficiency, and sustainability of parts, the SBB can be effective at assisting users and balancing its functionality with supporting these environmental concerns.

Mahlet’s Status Report for 10/12/2024

This week, I focused mainly on the design report aspect. After my team and I had a meeting, we split up the different sections of the report fairly, and proceeded to work on the deliverables. 

I worked mainly on the audio triangulation, robot neck motion(components included) and the robot base design. In the design report, I worked on the use-case requirements of the audio response and the robot base. I made the final block diagram for our project. After this, I worked on the design requirements for the robot’s dimensions, and the audio cue response mechanism. After identifying these, I worked on the essential tradeoffs for choosing to use some of our components such as the Raspberry Pi, the servo response, choice of material for our robot’s body, and microphone for audio input. After this, I worked on the system implementation for the audio cue response and unit and integration testing of all these components. 

Following our discussion, I finalized the bill of materials, and provided risk mitigation plans for our systems and components. 

In addition to this, I was able to spend some time discussing the audio response methodology and approach with Professor Bain. After implementing the forward audio detection system (i.e. knowing the location of the audio source and the location of the microphone receivers), my goal was to work backwards, without knowing the location of the audio source. From this meeting, and further research, I concluded my approach as follows, and will be working on it in the coming week. More detail on this implementation can be found on the design report.

The system detects a double clap by continuously recording and analyzing audio from each microphone in 1.5-second segments. It focuses on a 1-second window, dividing it into two halves and performing a correlation to identify two distinct claps with similar intensities. A bandpass filter (2.2 kHz to 2.8 kHz) is applied to eliminate background noise, and audio is processed in 100ms intervals.

Once a double clap is detected, the system calculates the time difference of arrival (TDOA) between microphones using cross-correlation. With four microphones, it computes six time differences to triangulate the sound direction. The detection range is limited to 3 feet, ensuring the robot only responds to nearby sounds. The microphones are synchronized through a shared clock, enabling accurate TDOA calculations, allowing the robot to turn its head toward the detected clap.

I am a little behind on schedule as parts are not here yet. I will be working on the Robot base building with Jeffrey once that is complete, and do testing on the audio triangulation with microphones and RPi after performing the necessary preparations within the coming week. 

Mahlet’s Status Report for 10/05/2024

This week, my primary focus was gathering data for the design report and completing tasks related to audio localization.

For the audio localization, I used MATLAB to simulate and pinpoint an audio source on a randomly generated 10×10 grid. I arranged the microphones in a square (2×2) configuration and randomized the location of the audio source. By calculating the distance between each microphone and the audio source, and considering the speed of sound (approximately 343 m/s), I determined the time delays relative to each microphone.

I applied the Time Difference of Arrival (TDOA) method. For each pair of microphones, the difference in the time it takes for sound to reach each microphone forms a hyperboloid. I repeated this process for every microphone pair, and the intersection of these hyperboloids provided a reasonable estimate of the audio source’s location. In MATLAB, I looped over the grid and computed the integer intersections of various locations. Using the Euclidean approach, I predicted the distance and calculated the corresponding TDOA using the speed of sound. By comparing the predicted TDOA with the actual time delays, I tried to estimate the error in the localization process.

The following figure illustrates the results, where ‘X’ represents the audio source, and ‘O’ marks the microphone positions. Additionally, I will include the relevant equations that informed this approach.

 

Currently, I am facing an issue with pinpointing the exact location of the source. To address this, I plan to refine the grid resolution by using smaller iterations, which should allow for greater accuracy. I will also calculate and display the approximate error in the final results. So far, I have a general idea of the audio source’s location, as indicated by a dark blue line, and I will continue working to pinpoint the exact position. Once I achieve this, I will conduct further simulations and eventually test the system using physical microphones, which will introduce additional challenges.

I am slightly behind on the project schedule. By next week I aim to finalize the audio localization section of the design report, along with the remaining parts of the report, in collaboration with my team.  I had a goal to setup the robot neck rotation servos by this week. This hasn’t been done as well. We will be finalizing the bill of materials by this week. I will be working on this as early as the components get in. To make up for this, I will be spending some time over fall break working on this.

According to the Gantt chart, Jeffrey and I had planned on building the robot, by the end of this week. This hasn’t been completed yet, but the CAD design is already completed. This week we will meet and discuss more about space constraints and make decisions accordingly.

Team Status Report 10/5/2024

After the design review, our team sat down and had a discussion on whether we should do some pruning to our features following Professor Tamal’s advice. As such, we have made some changes to our design. We have decided to remove 2 features – the ultrasonic sensor and the photoresistor from our robot design.  Our changes were to address significant risks with regards to implementing too many features and not having enough time to test and properly integrate them with each other. Doing so will also provide more slack time in between to address issues. We also further discussed specifics regarding the microphones we will have on our robot. One potential risk to mitigate would be the speaker and motor servos design. We plan to start our implementation with a speaker that can fit in the robot body along with motor servos that can provide left/right translation coupled with up/down z-axis translation. We would implement this plan if the robot is unable to do a more swivel type motion, so that our robot can still maintain interactivity without being too difficult to program its movements.

Changes to existing design: Removal of Ultrasonic Sensor and Photoresistor

The purpose of the ultrasonic sensor was to sense the user’s presence to decide whether to keep the timer running or not during a study session. However, this use case clashes with the text-to-speech(TTS) use case where if the user is using TTS and leaves the desk area to wander around, the sensor would trigger the timer and pause the study session and the TTS although the user did not intend for it to be paused. Even if it was possible to allow the user to continue listening to the generated speech, it limits the user from being able to walk around during studying. By removing the sensor, this allows a more flexible study style among users. We will be replacing this with a timer pause/play button on the robot’s leg. If the user needs to quickly get away from the desk they can click the button to pause the timer, and can also walk around/ fidget when studying. Furthermore, this resolves the issue of having to add additional features like an alert asking the user if they are still there because in practice, the sensor could eventually stop noticing the user, if the user is very still.

As for the photoresistor, the use case was when the robot is already turned on, but goes into an idle state and so if the user turns a light on, the robot should be able to “wake up” and greet the user. We felt that this use case was too niche, and although a nice perk to have, not integral to the design of the robot. Fundamentally, the robot is meant to help a student study but also provide entertainment when the study is tired/needs a break. Thus, this feature felt not as crucial to include in our project. We believe it would be more beneficial for us to remove it and focus on making our other features better instead. 

Changes to existing design: Addition of an idle state 

An additional feature that our team devised was to implement a sleep state for the robot to conserve power and prevent the Raspberry Pi from overheating. If the user leaves in the middle of a study session or doesn’t return after a break reminder, the robot will enter a sleep state after 10 minutes of inactivity, upon which the robot’s DCI display will feature a sleeping face screensaver. We believe that a sleep state is useful to both save power and pause all processes, and if users choose to return to a study session, the robot will be able to wake up on command and resume processes such as study timers and interactive games immediately.

Specification of existing design: Microphones

We have decided that we will be using two pairs of  compact ½”cardioid condenser microphones. Each placed at the corners of the robot to pick up sound within a 3 feet radius. This will not incur additional costs, as it will be borrowed from the ECE department. 

Update to schedule: 

Removal of testing and integration for the ultrasonic sensor and photoresistor to allow for more integration time for all other components. Otherwise, everything remains the same.

Shannon’s Weekly Report 9/28/2024

This week, I focused on narrowing down the specifics of the Robot and the WebApp with my team.  We wanted to have a clear idea of what exactly our robot will look like and what the WebApp would look like. We discussed in-depth on what our robot dimensions should be and came to the conclusion that the robot should be roughly 12-13 inches in height to account for eye level on a desk. Since the LCD display will be around 5 inches, the base will have a height of about 7 inches. We also discussed the feet dimensions, which came out to be 2.5 inches wide to account for the 3 rock paper scissors buttons and 1 inch in height to account for the buttons sticking out. Then, I lead the discussion around what the WebApp should look like, what pages we should have, and what each page should do. We decided on 4 main pages:

  • a Home page displaying the most recent study sessions and todo lists,
  • a Timer page that allows the user to set timers for tasks and a stopwatch to time how long they take to do tasks,  
  • a Focus Time/Study Session page where the study can start, pause, and end a study session, and view statistics/analyze their study sessions,
  • a Rock-Paper-Scissors page, where the user can start a game with the robot.

Following our discussion, I have started working on the Timer page for our WebApp. I have finished the basic timer and stopwatch features, so now a user can start a timer, and they can start and stop a stopwatch. Attached is a screenshot of this. I also plan on adding a feature where the previous timer and stopwatch timings are recorded with tags the user can add to the previous activity.

 

According to the Gantt chart, I am on target. 

In the next week, I’ll be working on:

  • Completing the Timer Page
  • Coding up the Focus Time/Study Session Page
  • Fully finalizing a plan on how to integrate the robot with the WebApp

Mahlet’s Status Report for 9/28/2024

This week, I worked on the CAD design of our Studybuddy robot using Solidworks. After discussing with my team about the space constraints and objects we will need to integrate in the robot, we decided on general dimensions that give optimal dimensions. The base box is 8in x 7in x 6in. The head is 6in x 6in x 5in. The DCI display screen will be attached to the head in the designated extrusion as shown in the CAD drawing. The legs will contain buttons to power on the robot, pause and continue timers if necessary and buttons to interactively play rock paper scissors. Output of the buttons will be displayed on the DCI display from both players.

Considering the fact that the directional microphones are not reliable to pinpoint the exact direction of sound, I would have to start by creating simulations to see how the audio is delivered at different corners of the robot. I am still planning to use a combination of the MEMS array of microphones along with the directional microphones. Following the previous week’s feedback and finalized the microphones and will start creating simulation models to conceptualize the system behavior. 

 I am on track with the progress for this week. I have identified the microphones and servo motors we will be using for the robot. In addition, I have borrowed (free) a photoresistor, an ultrasonic sensor, and a temperature and humidity sensor for testing purposes. 

By next week, I would like to get more research regarding the audio triangulation mechanism and mathematical derivation. I will be setting up the text-to-speech libraries on a computer and figure out the integration with RaspberryPi, and setting up speakers from the RPi by week 7 with Shannon. I will also meet with Jeffrey to analyze motor specifications including the voltage, power and torque using datasheets.