Status report – Team C5: The Sound of Sight

December 7, 2024December 8, 2024

Shravya’s Status Report for 12/07/2024

This week, significant effort was spent implementing UART communication between the STM32 microcontroller and an external system, such as a Python script or serial terminal, to control solenoids. The primary goal was to parse MIDI commands from the Python script, transmit them via UART, and actuate the corresponding solenoids based on the received commands.

Description of Code Implementation:
- The UART receive interrupt (HAL_UART_RxCpltCallback) was set up in main.c to handle incoming data byte by byte and process commands upon receiving a newline character (\n).
- Functions for processing UART commands (process_uart_command) and actuating solenoids (activate_solenoid and activate_chord) were written and tested.
- The _write() function was implemented to redirect printf output over UART for debugging purposes.
Testing UART Communication:
- Python script (send_uart_data) confirmed successful transmission of parsed MIDI commands. See screenshot. This implies my computer is sending the data correctly but the STM32 is not receiving it properly.
- Minicom and other terminal tools were used to TRY to verify whether UART data was received on the STM32 side. They did not work because I can’t monitoring/”occupy” the port without inhibhiting the data being sent. It seems that sending data and monitoring data on that port are mutually exclusive. This makes sense, but I saw online that monitoring a port with a serial terminal was a common way that people debug communication protocols. I also don’t see any point in monitoring a port other than the one the communication is occurring on.
Unexpected and oddly specific solenoid activation:
- Observed that solenoids 5 and 7 actuated unexpectedly at program startup. This happened multiple times when I reflashed. Also, the movements were somewhat intricate and oddly specific. Solenoids 5 and 7 SIMULTANEOUSLY turned on, off, on (for shorter this time), off, on and off. This makes it seem as if I hardcoded a sequence of GPIO commands for solenoids 5 and 7 which I most definitely did not.
- Added initialization code in MX_GPIO_Init to set all solenoids to an off state (GPIO_PIN_RESET).
- Temporarily disabled HAL_UART_Receive_IT to rule out UART-related triggers but found solenoids still actuated, indicating the issue may not originate from UART interrupts.

I have been debugging this for a week. Initially, there were a few mistakes I genuinely did make (some by accident/oversight, some by genuine conceptual misunderstanding): I realised I had to explicitly flush my RX buffer, create a better mapping system from my python code which deals with an octave of notes that are called “60-72” (because this is MIDI’s convention) and I had to map them to the 0-12 numbering system I use for my solenoids in the firmware code. I also noticed one small mismatch in the way I mapped each solenoid in the firmware code to an STM32 GPIO pin. Also, the _write() function was implemented to redirect print() over UART for debugging purposes.

I now feel like I have a good conceptual understanding of UART, the small portion of autogenerated code that stm32cube ide generates (this is inherent to the way stm32cube ide does some clock configurations, it’s necessary and there is no way to stop this), and any additional functions I’ve written. I definitely knew far less when I started testing out my uart code a week ago. Yet, I am still stuck. I might be out of ideas on what component of this code to look at next. I’ve now shared my files with Peter who may be a fresh set of eyes, as well as a friend of mine who is a CS major. I have an exam on Monday, but after it ends, I will work on this in HH1307 with my friend until it functions. I need it to work before our poster submission on Tuesday night and video submission on Wednesday night.

December 7, 2024December 8, 2024

Peter’s Status Report from 12/07/2024

This Week

This week was spent creating the mount to hold the solenoids in the case. The first iteration (not pictured) had too low a casing at the base, so the walls were extended upwards by 3mm. The holes for the mount in the back were also too low, so they were moved up 2mm, which matched perfectly with the solenoid when printed.

Figure 1: Solenoid Bracket

The next step, which will be done Sunday, is to space 13 of these in a way that will match up with the piano and attach them all in one case.

Next Week

Next week will be spent finalizing the case, doing testing, and completing the deliverables for the Capstone Course.

December 7, 2024December 7, 2024

Team Status Report for 12/07/2024

See Shravya’s individual report for more details relating to UART debugging progress; in summary, while all UART-related code has been written and run for about a week now, debugging is still underway and taking much longer time than estimated. This bottlenecks any accuracy and latency testing Shravya can conduct with the solenoids playing any song fed in from the parser (accuracy and latency of solenoids behave as expected when playing a hardcoded dummy sequence though). Shravya hopes to get a fully-working implementation by Monday night (so there is ample time to display functionality in the poster and video deliverable), and conduct formal testing after that. She has arranged to meet with a friend who will help her debug on Monday.

Testing related to hardware:

As a recap, we have a fully functional MIDI-parsing script that is 100% accurate at extracting note events. We also are able to control our solenoids with hardcoded sequences. The final handshaking that remains is connecting the parsing script to the firmware to allow the solenoids to actuate based on any given song. Once the UART bugs are resolved, we will feed in manually-coded MIDI files that encompass different tempos and patterns of notes. We will observe the solenoid output and keep track of the pattern of notes played to calculate accuracy, which we expect to be 100%

During this phase of the testing, we will also audio record the output with a metronome playing in the background. We will manually set the timestamps of each metronome beat and solenoid press, and use those to calculate latency.

Some tests that have been completed are overall power consumption and ensuring functionality of individual circuit components:

Using a multimeter, we measured the current draw of solenoids under the 3 possible different scenarios. Obviously, the maximum power consumption occurs when all solenoids in a chord are actuated simultaneously, but even then we stay right under our expected power limit of 9 Watts.
To ensure the functionality of our individual circuit components, we conducted several small-scale tests. Using a function generator, we applied a low-frequency signal to control the MOSFET and verified that it reliably switched the solenoid on and off without any issues. For the flyback diode, we used an oscilloscope to measure voltage spikes at the MOSFET drain when the solenoid was deactivated. This allowed us to confirm that the diode sufficiently suppressed back EMF and protected the circuit. Finally, we monitored the temperature of the MOSFET and solenoid over multiple switching cycles to ensure neither component overheated during operation.

Shravya’s MIDI parsing code has been verified to correctly parse any MIDI file, either generated by Fiona’s UI or generated by external means, and handles all edge cases (rests and chords) which caused troubles previously.

Testing related to software:

Since software integration took longer than expected, we are still behind on the formal software testing. Fiona is continuing to debug the software, and plans to start formal testing on Sunday (see more in her report). For a reminder of our formal testing plans, see: https://course.ece.cmu.edu/~ece500/projects/f24-teamc5/2024/11/16/team-status-report-for-11-16-2024/. We are worried that we might have to restrict these testing plans a little bit, specifically by not testing on multiple faces, due to the fact that many people are busy with finals, but we will do our best to have a complete idea of the functionality of the system. One change we know for certain we will make is that our ground truth for eye-tracking accuracy will not be based on camera play-back, but on which button the user is directed to look at, for simplicity and to reduce testing error.

Last week, Peter did some preliminary testing on the accuracy of the UI and eye-tracking software integration in preparation for our final presentation, and the results were promising. Fiona will continue that testing this week, and hopefully will have results before Tuesday in order to include them in the poster.

December 7, 2024December 7, 2024

Fiona’s Status Report for 12/07/2024

Book-Keeping

I made a list of the final elements we have to wrap up in the last two weeks of the semester, and met with Shravya and Peter to assign tasks.

New Software Functionalities

I adjusted the software so that the user could open an existing composition without opening an external window. In order to do so, I implemented some logic where the program opens the next available file and loops back around to the first if it reaches the end of the available songs in the folder. Again, this obviously has some drawbacks in that the user may have to iterate through many compositions before reaching the end of the list, but it is accessible and simple, two of the most important foundations of our UI.

Another quick fix I made was to make the buttons different colors, at Peter’s suggestion, in order to make it easier for the users to differentiate between the different buttons and memorize different commands easier. Peter also mentioned that it would be good if there was a delay between calibration and the eye-tracking beginning right away so the user could orient themselves to the UI, so I also added in a short delay for that.

I also created a separate file that users could easily adjust sampling delay, gain and number of iterations for confirmation for the eye-tracking, in order for the user to be able to adjust those things as best suits them.

I had previously identified a bug in which only the first page of sheet music would appear on the secondary UI, even if the cursor was on another page. I thus added two more buttons on the screen so the user could move back and forth between the pages of the sheet music. I verified that this worked with sample compositions from this repository: [1]. (That were greater than one page).

I also made it so that the rest and stack buttons would be highlighted when they were selected but before they were performed (so the user is aware of them).

Finally, since I have been having so much trouble with highlighting the cursor location on sheet music, mainly due to the variable locations of any one cursor location depending on any sharps/flats and preceding notes, etc., I decided to show the user the current cursor position on the main UI as a number (e.g. “Cursor at note 1.”). This was not our original plan, but I still believe it to be a viable solution to ensure the user knows where in the piece they are.

Debugging

I fixed a small bug where the C sharp note was highlighted when the high-C note was confirmed and another small bug in which the number of notes did not reset to 0 when a new file was opened.

Then, I did some stress testing to confirm that the logic used to build rests and chords up to three notes was completely sound. While testing the eye-tracking, I had been running into some situations in which it appeared that the logic broke, but I could not figure out why. I used a different Python script that would run the command responses on key press (rather than eye-press) to easily test this functionality. Here were the edge case bugs I identified and fixed:

When a rest is placed after a single eighth note, the eighth note is extended to a quarter note. I discovered with pretty certain confidence that this is actually a bug in the open-source sheet-music generator that we are using [1], because when I would play the composition in GarageBand, the eighth notes would play where expected even though the sheet music generator was interpreting them as quarter notes. I further verified by using another online sheet music generator, which also produced the expected value: https://melobytes.com/en/app/midi2sheet. Since I have already been struggling to modify this open-source program to highlight the cursor location and this is a very specific edge case, I decided it would be a better use of my time to not try to fix that bug within the program, and instead focus on our code. I have left a note in the README identifying the bug to users.
Removing chords did not work successfully. I fixed this bug and verified it did not create a bug for single-note removal. Also, I verified that chords could be removed from anywhere in the piece, not just the end.
There were some logic errors with chords meant to have rests before them, which I fixed. I also optimized the three note chord logic to avoid bugs, although I hadn’t identified any yet.

Next Week

After double-checking that the eye-coordinates, of which there have been some discrepancies, I plan to start performing the latency, accessibility and accuracy tests on the software. As a group, we will work on the poster and video next week so my primary goal is to finish software testing ASAP for those deliverables, but I will also work on other elements of them as needed. Additionally, I have to collaborate with Shravya once her UART code is working to integrate our two systems, but I have already set up some code for that so I do not anticipate that taking too long. For the final report, I will work on Use-Case & Design Requirements, Design Trade Studies (UI), System Implementation (UI), Testing (Software), and Related Works sections.

References

[1] BYVoid. (2013, May 9) MidiToSheetMusic. GitHub. https://github.com/BYVoid/MidiToSheetMusic

November 30, 2024December 1, 2024

Peter’s Status Report from 11/30/2024

Week Nov 17-23.

Using OpenCV to get frames to pass into a Mediapipe function to track the user’s irises, I was able to create a basic gaze-tracker that estimates where you are looking on a computer screen based on the location of the user’s irises. In Figure 1, the blue dot—located to the left of the head—shows the user the estimated location of where they are looking on the screen. Currently, there are two main configurations that the gaze-tracking can have. Configuration (1) has precise gaze tracking that requires head movement to make up for the small movements of the eyes, and for Configuration (2) gaze tracking only requires eye-movement, but also requires the head to be kept impractically still and has jittery estimates. In order to improve the Configuration 2, the movement of the head needs to be taken into consideration when calculating the estimated gaze of the user.

Figure 1: Mediapipe gaze-tracker

Week Nov 24-30.

Tested the accuracy of Fiona’s implementation of UI and eye-tracking integration. To test accuracy of the current gaze-tracking implementation, in Configuration 1 and Configuration 2, we looked at each command twice, and if it did not correctly identify a command, I kept trying to select the intended command until the correct command was selected. Using this method to test accuracy, Configuration 1 had precise control and, had 100% accuracy. Configuration 2, while having 89.6% accuracy in testing, had a very jittery gaze estimation, making it difficult to feel confident about the the cursor’s movements, and the user’s head has to be kept too still to be practical for widespread use. Preferably, the user only needs to move their eyes to track their gaze. As a result, the eye-tracking will be updated next week to take head movement into consideration, hopefully making the gaze estimate more smooth.

Tools for learning new knowledge

Using Mediapipe and OpenCV are new to me. To get comfortable with these libraries, read the online Mediapipe documentation and I followed different online video tutorials. Following these tutorials, I was able to discover applications for the Mediapipe library functions that were useful for my implementation.

This Week

This week, I hope to complete the eye-tracking, taking into consideration head movements to make, what is currently being referred to as, configuration 2 a more viable solution.

November 30, 2024December 3, 2024

Shravya’s Status Report for 11/30/2024

This week, I successfully handled the MIDI parsing of chords and rest notes. I can now say that the MIDI parsing is fully functional. I also completed the core implementation of UART communication, which required updates to both my Python code and STM32Cube firmware code. In Python, I added functionality to parse MIDI files and send formatted commands (e.g., notes, chords, and rests) to the STM32 over UART. On the STM32 side, I integrated UART reception code to parse incoming data and trigger the appropriate solenoid actuation logic. I met with Fiona today (Saturday) to try to get this all working on her laptop as that is the local machine we will be using in our final demo. One trivial issue that wasted about an hour of our time is that the USB to USBC converter is weak and only works when the cord is positioned at a very specific angle. Another hour and a half was spent on true debugging. Later in the night, I bought a new USB-USBC converter.

I will be the one presenting at the final presentation and I spent several hours reciting (including in front of parents + friends) and memorising my script. We set an early deadline for ourselves to submit the slides (4 pm) so that I’d have plenty of time to practice. In addition, I spent several hours over the span of three days writing the script with Fiona; I contributed about 1000 words, and we had 2400 words when done. I then condensed everything to about 1700 words total, which according to speech-speed estimators online is appropriate for a 12 minute speech. I timed myself to be around 11:30- hopefully I keep this up at the real event.

Challenges: While the bulk of the logic appears to be functioning as expected, I encountered some issues with the STM32Cube settings, particularly with UART peripheral initialisation and interrupt handling. These bugs caused intermittent failures in receiving commands accurately. I’ve debugged the main issues but need an additional 2-3 hours to fully resolve the remaining inconsistencies and ensure the system operates smoothly.

Next Steps:

Finalize and test the STM32Cube configuration to eliminate remaining bugs.
Conduct integration tests with the complete system, ensuring seamless communication between the Python parser and STM32 firmware.
Begin preparing for final demo and report by collecting test data and documenting system performance.

Overall, the core logic for UART communication does seem to be complete, and I am confident the system will be ready for testing and fine-tuning soon.

To implement UART communication, I needed to learn about both Python serial communication and STM32CubeIDE’s UART configuration. Key areas of new knowledge include:

STM32CubeIDE Peripheral Configuration:
- I learned how to enable and configure USART peripherals in the STM32Cube device configuration tool, including setting baud rates, TX/RX pins, and interrupt-based data handling.
- Learning Strategy: I referred to STM32Cube’s official documentation and watched YouTube tutorials for step-by-step guidance. Debugging required forum posts and STM32-specific threads for common pitfalls.
Python Serial Communication:
- I learned how to use the pyserial library to send and receive data over UART. This included handling issues such as buffer management and encoding data correctly for the STM32.
- Learning Strategy: I consulted the official pyserial documentation, followed by informal learning via online coding examples on Stack Overflow.
Debugging UART Communication:
- I learned to use tools like serial terminal emulators (e.g., Tera Term, minicom, puTTy) to test data transmission and reception. I also gained experience debugging embedded systems using live logs and peripheral monitoring.
- Learning Strategy: This was mostly trial and error, supported by online forums and STM32 user groups.

Learning Strategies Used:

Online Videos and Tutorials: YouTube tutorials were instrumental for understanding STM32Cube setup and Python UART implementation.
A Reddit thread that helped me understand how to handle the absolute time vs relative time issue in Python Mido (my final solution ended up being different but this thread was a good start)
Documentation and Forums: STM32, pyserial, and Python MIDO official documentation provided technical details, while forums (e.g., Stack Overflow and STM32 Community) helped address specific bugs.
Trial and Error: Debugging UART behavior was primarily hands-on, using systematic testing and iterative improvements to isolate and fix issues.

November 30, 2024

Team Status Report for 11/30/24

It has been two weeks since the last status report. In the first week, we completed the interim demo. Then, we started working on integrating our three subsystems. The eye-tracking to application pipeline is now finished and the application to hardware integration is very close to finished.

There are still some tasks to complete for full integration: The eye tracking needs to be made more stable, and the CAD model for the solenoid case needs to be completed and 3D-printed. We suspect there are some issues with STM32cubeIDE settings when we attempt to integrate UART to dynamically control solenoids based on parsed MIDI data.

Our biggest risk right now is not finishing testing. Our testing plans are extensive (see last week’s team report), so they will not be trivial to carry out. Since we have a shorter time frame to complete them than expected, we might have to simplify our testing plans, but that would not be optimal.

November 30, 2024November 30, 2024

Fiona’s Status Report for 11/30/2024

Last Week

Interim Demo Prep

On Sunday, I worked on preparing my subsystem for the interim demo on Monday. That required some bug fixes:

Ensured the cursor would not go below 0 after removing a note at the 0-th index in the composition.
Fixed an issue that caused additional notes added onto the chord to be 0-length notes. But the fix requires that notes of the same chord to be the same length, which I will want to fix later.
Fixed an issue affecting notes written directly after a chord, in which they were stacked onto the chord even if the user didn’t request it and also could not be removed properly.

I also added some new functionality to the application in preparation for the demo.

Constrained the user to chords of two notes and gave them an error message if they attempt to add more.
Allowed users to insert rests at locations other than the end of the composition.

Then, before the second interim demo, I fixed a small bug in which the cursor location did not reset when opening a new file.

This Week

Integrating with Eye-Tracking

I downloaded Peter’s code and its dependencies [1][2][3][4] to ensure it could run on my computer. I also had to downgrade my Python version to 3.11.5 because we had been working in different versions of Python. Fortunately there were no bugs I noticed immediately when I downgraded my code.

In order to integrate the two, I had to adjust the eye-tracking program so that it did not display the screen capture of the user’s face and so that the coordinates would be received on demand rather than continuously. Also, I had to remove the code’s dependence on the wxPython GUI library, because it was interfering with my code’s use of the Tkinter GUI library [5][6].

The first step of integration was to draw a “mouse” on the screen indicating where the computer things the user is looking [7][8].

Then, I made the “mouse” actually functional such that the commands are controlled by the eyes instead of the key presses. In order to do this and make the eye-tracking reliable, I made the commands on the screen much larger. This required me to remove the message box on the screen, but I added it back as a pop-up that exists for ten seconds before deleting itself.

Additionally, I had to change the (backend) strategy in which the buttons were placed on the screen so that I could identify the coordinates of each command for the eye-tracking. In order to identify the coordinates of the commands and (hopefully) ensure that the calculations hold up for screens of different sizes, I had the program do calculations based on width and height of the screen within the program.

I am still tinkering with the number of iterations and the time between each iteration to see what is optimal for accurate and efficient eye tracking. Currently, five iterations with 150ms in between each seems to be relatively functional. It might be worthwhile to figure out a way to allow the user to set the delay themselves. Also, I currently have it implemented such that there is a longer delay (300ms) after a command is confirmed, because I noticed that it would take a while for me to register that the command had been confirmed and to look away.

Bug Fixes

I fixed a bug in which the note commands were highlighted when they shouldn’t have been. I also fixed a bug that caused the most recently seen sheet music to load on start up instead of a blank sheet music, even though that composition (MIDI file) wouldn’t actually be open.

I also fixed some edge cases where the program would exit if the user performed unexpected behavior. Instead, the UI informs the user with a (self-deleting) pop-up window with the relevant error message [9][10]. The edge cases I fixed were:

The user attempting to write or remove a note to the file when there was no file open.
The user attempting to add more than the allowed number of notes to a chord (which is three)
The user attempting to remove a note at the 0-index.

I also fixed a bug in which removing a note did not cause the number of notes in the song to decrease (internally), which could lead to various issues with the internal MIDI file generation.

New Functionality

While testing with the eye-tracking, I realized it was confusing that the piano keys would light up while the command was in progress and then also while waiting for a note length (in the case that pitch was chosen first). It was hard to tell if a note was in progress or finished. For that reason, I adjusted the program so that a note command would be highlighted grey when finished and yellow when in progress.

I also made it such that the user could add three notes to a chord (instead of the previous two) before being cut off, which was the goal we set for ourselves earlier in the semester.

I made it so that the eye-tracking calibrates on start-up of the program [11], and then the user can request calibration again with the “c” key [12]. Having a key press involved is not ideal because that is not accessible, however since calibration automatically happens on set-up, hopefully this command will not be necessary most of the time.

Finally, I made the font sizes bigger on the UI for increased readability [13].

Demo

After the integration, bug fixes, and new functionality, here is a video demonstrating the current application while in use, mainly featuring the calibration screen and an error message: https://drive.google.com/file/d/1dMUQ976uqJo_J2QwzubH3wNez9YMPM8Y/view?usp=drive_link

(Note that I use my physical cursor to switch back and forth between the primary and secondary UI in the video. This is not the intended functionality, because the secondary UI is meant to be on a different device, but this was the only way I could video-record both UIs at once).

Integrating with Embedded System

On Tuesday, I met with Shravya to set up the STM32 environment on my computer and verified that I could run the hard-coded commands Shravya made for the interim demo last week with my computer and set-up.

On Saturday (today), we met again because Shravya had written some Python code for the UART communication. I integrated that with my system so that the parsing and UART could happen on the demand of the user, but when we attempted to test, we ran into some problems with the UART that Shravya will continue debugging without me.

After that, I made a short README file for the entire application, since the files from each subsystem were consolidated.

Final Presentation

I made the six slides for the presentation, which included writing about 750 words for the presentation script corresponding to those slides. I also worked with Shravya on two other slides and wrote another 425 words for those slides.

Next Week

Tomorrow, I will likely have to finish working on the presentation. I wrote a lot for the script, so I will likely need to edit that for concision.

There is still some functionality I need to finish for the UI:

Writing the cursor on the sheet music. I spent quite a while trying to figure that out this week, but had a lot of trouble with it.
Creating an in-application way to open existing files (for accessibility).
Adding note sounds to the piano while the user is hovering over it.
Have the “stack notes” and “rest” commands highlighted when that option is selected.

And some bug fixes:

Handle the case in which there are more than one page of sheet music and how to determine which page the cursor is on.
Double-check that chords & rests logic are 100% accurate. I seem to be running into some edge cases where stacking notes and adding rests do not work, so I will have to do stress tests to figure out exactly why that is happening.

However, the primary goal for next week is testing. I am still waiting on some things from both Peter (eye-tracking optimization) and Shravya (debugging of UART firmware) before the formal testing can start, but I can set-up the backend for the formal testing in the meantime.

Learning Tools

Most of my learning during this semester was trial and error. I generally learn best by just testing things out and seeing what works and what doesn’t, rather than by doing extensive research first, so that was the approach I took. I started coding with Tkinter pretty early on and I think I’ve learned a lot through that trial and error, even though I did make a lot of mistakes and have to re-write a lot of code.

I think this method of learning worked especially well for me because I have programmed websites before and am aware of the general standards and methods of app design, such as event handlers and GUI libraries. Even though I had not written a UI in Python, I was familiar with the basic idea. Meanwhile, if I had been working on other tasks in the project, like eye tracking or embedded systems, I would have had to do a lot more preliminary research to be successful.

Even though I didn’t have to do a lot of preliminary research, I did spend a lot of time learning from websites online while in the process of programming, as can be seen in the links I leave in each of my reports. Formal documentation of Tkinter and MIDO were helpful for getting a general idea of what I was going to write, but for more specific and tricky bugs, forums like StackOverflow and websites such as GeeksForGeeks were very useful.

References

[1] https://pypi.org/project/mediapipe-silicon/

[2] https://pypi.org/project/wxPython/

[3] https://brew.sh/

[4] https://formulae.brew.sh/formula/wget

[5] https://www.geeksforgeeks.org/getting-screens-height-and-width-using-tkinter-python/

[6] https://stackoverflow.com/questions/33731192/how-can-i-combine-tkinter-and-wxpython-without-freezing-window-python

[7] https://www.tutorialspoint.com/how-to-get-the-tkinter-widget-s-current-x-and-y-coordinates

[8] https://stackoverflow.com/questions/70355318/tkinter-how-to-continuously-update-a-label

[9] https://www.geeksforgeeks.org/python-after-method-in-tkinter/

[10] https://www.tutorialspoint.com/python/tk_place.htm

[11] https://www.tutorialspoint.com/deleting-a-label-in-python-tkinter

[12] https://tkinterexamples.com/events/keyboard/

[13] https://tkdocs.com/shipman/tkinter.pdf

November 16, 2024November 17, 2024

Peter’s Status Report from 11/16/24

This week Fiona and I met with Professor Savvides’s staff member, Magesh, to discuss how we would develop the eye-tracking using computer vision. Magesh gave us the following implementation plan.

Start with openCV video feed
Send frames to mediapipe python library
Mediapipe returns landmarks
From landmarks, select points that correspond to the eye-region
Determine if you are looking up, down, left, or right
Draw a point on the video feed to show where the software thinks the user is looking so there is live feedback.

Drawing a point on the video feed will serve to verify that the software is properly tracking the user’s iris and correctly mapping its gaze to the screen.

So far, I have succeeded in having the openCV video feed appear. I am currently bug fixing to get a face-mesh to appear on the video feed, using mediapipe, to verify that the software is tracking the irises properly. I am using Google’s Face Landmark Detection Guide for Python to help me implement this software1. Once I am able to verify this, I will move on to using a face landmarker to interpret the gaze of the user’s irises on the screen, and return coordinates to draw a point where the expected gaze is on the screen.

Resources

Google AI for Developers. (2024, November 4). Face Landmark Detection Guide for Python. Google. https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker/python

November 16, 2024

Team Status Report for 11/16/2024

This week, our team prepared for the interim demo that takes place next week. We met with our advisors, Professor Bain and Joshna, to receive advice on what to work on before our demo. Peter and Fiona also met with a grad student to discuss the eye-tracking implementation.

Currently, our plan for testing is mostly unchanged, however it is reliant on the integration of our subsystems, which has not yet happened. This is why one of our biggest risks currently is running into major issues during integration.

To recap, our current plan for testing is as follows.

For the software (user interface and eye-tracking software): We will run standardized tests of different series of commands on multiple users. We want to test with different users of the system in order to test different parameters, like face shape/size, response time, musical knowledge and familiarity with the UI. We plan for these tests to cover a range of different scenarios, like different expected command responses (on the backend) and different distances and/or time between consecutives commands. We also plan to test edge cases in the software, like the user moving out of the camera range, or a user attempting to open a file that doesn’t exist.
- Each test will be video-recorded and eye-commands recognized by the backend will be printed to a file for comparison, both for accuracy (goal: 75%) and latency (goal: 500ms).
For the hardware (STM32 and solenoids): We will give different MIDI files to the firmware and microcontroller. Like with the software testing, we plan to test a variety of parameters, these include different tempos, note patterns, and the use of rests and chords. We also plan to stress test the hardware with longer MIDI files to see if there are compounding errors with tempo or accuracy that cannot be observed when testing with shorter files.
- To test the latency (goal: within 10% of BPM) and accuracy (goal: 100%) of the hardware, we will record the output of the hardwares commands on the piano with a metronome in the background.
- Power consumption (goal: ≤ 9W) of the hardware system will also be measured during this test.

We have also defined a new test for evaluating accessibility: we plan to test that we can perform every command we make available to the user without having to use the mouse or the keyboard, after setting up the software and hardware. An example of an edge case we would be testing during this stage is ensuring that improper use, like attempting to send the a composition to the solenoid system without the hardware being connected to the user’s computer, does not crash the program, but is rather handled within the application, allowing the user to correct their use and continue using just eye-commands.