March 2023 – Team D0: KaraoKey

Month: March 2023

March 26, 2023

Anita’s Status Report for 3/25

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours). This week was admittedly a bit of …

Anita's Status Reports

by anitama

0 Comments

March 25, 2023

Team Status Report for 03/25

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready? Right now, a big risk for our team is completing pitch detection integration with the web application. This element of …

Team's Status Reports

by agerchan

0 Comments

March 25, 2023

Anna’s Status Report for 03/25

This week, for capstone, we spent part of our class time doing the ethics discussion. During the discussion, a few things had been noted by other teams that we hadn’t considered. Namely, a group pointed out that different accents or dialects may affect the pitch of singing. While we had thought about accommodating people with different vocal ranges by using relative pitch, we had not considered it. I am unsure if fully researching this and figuring out a solution is in scope of our project, but it is certainly something we should keep in mind during design.

Next, the web application is now templated for the integration of the pitch detection algorithm. Once the recording begins, sound information is collected every 100ms and calls the ondataavailable method. In this method, both a full version of the recording, and a 10 byte slice of the end of the recording is sent, via POST request, to the Django backend. There is a point in the backend that was set up for Kelly to perform pitch detection. Both audio files had been extracted from the POST request. After this, a json file is sent back to the page JavaScript via Xttp object. After adding in pitch detection, this json will contain the pitch. This value gets updated on the screen.

Additionally, I implemented the updating of the graph to display notes and lyrics. They update via chunk and correspond to the time in an audio file. Each chunk contains pitch data over some period of time, as well as corresponding lyrics.

The chart being used is an stepped line graph made with graph.is. I turned off interactivity and grid features to make it more visually appealing. The values being iterated at the moment are dummy values, but it is a demonstration of how we want the final version to look. We might want to add vertical and horizontal axis labels, as well as a second dataset displaying the user’s performance. This second dataset can be added once the pitch detection is integrated.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

I am on schedule. There are insertion points for integration of the pitch detection, as well as basic implementations of all features, which will have to be adjusted as we keep working through combining both aspects of the project.

What deliverables do you hope to complete in the next week?

Next week, I am hoping to have some pitch detection in the web application. At this point, I’d like to combine the graph and recorder, which will also allow us to add the user pitch graphics as well. At that point, the next step will be to create datasets like the dummy one being used now for all chosen songs.

Anna's Status Reports

by agerchan

0 Comments

March 25, 2023

Kelly’s Status Report for 3/25

Kelly's Status Reports

by kwoicik

0 Comments

March 18, 2023

Anna’s Status Report for 03/18

Anna's Status Reports

by agerchan

0 Comments

March 18, 2023

Team Status Report for 03/18

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

As of now, the current biggest risk is still the pitch tracking algorithm, especially since we just pivoted it to read in a .wav file rather than a continuous stream of audio input. With our pitch tracking algorithm, our biggest concerns are latency and accuracy.

Let’s talk about latency. As of now, Kelly’s .wav input pitch tracking algorithm is processing a 13 second audio in 0.029582977294921875 seconds ~ 300 milliseconds from start to end. While this may be higher than our latency cap, this chunk of 13 seconds is a much longer .wav input than we expect to be sending to the backend at once. Therefore, we are not currently concerned about meeting our latency goal, but further testing will need to be done once integrated with our web application frontend. In general, if we find that the latency here is too long, we have a couple of options: limit the duration of the .wav file we send to the backed and/or limit the amount of processing we are doing on the .wav signal.

Now let’s talk about accuracy. Kelly tested the new pitch tracking algorithm on 6 .wav files split between 2 files: C Major Scale – Slow and C Major Scale – Fast. The 3 .wav files for each scale version (fast and slow) were the following: straight piano input, humming input, note names input (i.e. Do Re Mi Fa Sol …). The results of these tests can be found in the following graphs:

Overall, the pitch tracking is looking extremely accurate when comparing the piano input to a singers’ input. Obviously, there will need to be some filtering done on this signal, especially when using words as the air expended when consonants are used tends to throw off the pitch tracker. However, this testing leaves us pretty satisfied with the current setup. If further accuracy is needed, we will consider processing the signal more than this by using some sort of bandpass filter in order to eliminate the interference of noise.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

The addition of MediaStream and the subtraction of PyAudio were made this week. MediaStream is going to provide a continuous recording stream of the users voice on the web application frontend in javascript. This was previously being handled by PyAudio, but we couldn’t figure out how to properly communicate a python PyAudio stream to the javascript frontend. MediaStream will now send a .wav file to the Aubio backend which will then send back the pitch found.

This change was necessary in order to get pitch tracking to display on our web application. However, this change comes at the cost of about a week of progress. As we left 5 weeks of slack for pitch tracking, we just tapped into one of those and didn’t affect the progress of our project too much overall.

This is also the place to put some photos of your progress or to brag about a component you got working.

This week was very coding heavy, but we’re quite proud of the new pitch tracking algorithm and our lovely new graphs of testing.

Team's Status Reports

by kwoicik

0 Comments

March 18, 2023

Kelly’s Status Report for 3/18

Kelly's Status Reports

by kwoicik

0 Comments

March 17, 2023

Anita’s Status Report for 3/18

Anita's Status Reports

by anitama

0 Comments

March 12, 2023

Team Status Report for 03/11

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

There was a lot of feedback given from peers and the instructors on the design review presentation that highlighted potential risks. We’ll address feedback and potential risk mitigation strategies in this section. There were three main categories of feedback—testing, hardware, and pitch detection method.

Testing

How will latency be determined?
“I think that they should also consider testing how quickly the audio will be processed and how quickly the feedback is given so that they are not being told [incorrect information]“

Faulty tests can jeopardize this project, as these tests are what guide us to make modifications and define our definition of “success.” Thus, it is imperative that we create robust tests for our most important use-case requirements. Latency testing for such a quick interval proves to be rather difficult, as even the process of testing can slow down and impact the accuracy of these tests. For example, displaying output in a terminal continuously drastically slows down the program, so we must avoid that. Thus, we must rely on a testing method that is lightweight and does not slow down our program too much.

We have ideated a more robust latency testing plan after hearing this feedback. Python has a time module that allows us to determine the exact time, down to the millisecond. Similarly, in JS, we can use the Date object. In the python backend, we will scatter benchmarks where we collect data about when a certain operation was performed. Examples of such benchmarks include: @input audio is received, @audio is processed, @feedback is generated, @information is sent to server.

After collecting the timestamps, we will stop the program and do some post-processing of the data to determine latency. This data will inform us whether we are hitting our latency use-case requirement for real-time feedback, in addition to determining whether we are giving the user correct feedback about their note.

“In regard to testing metrics, one potential modification is being too high [large] of a range for a user to receive feedback—though I was hoping there is some data or justification out there as to why 5% is realistic—and also as to why 0.25 seconds for note detection is a fast enough turnaround time, and if so, why it can be achieved (might it take longer?)”
“The test metrics should also look at the user experience.”
“User satisfaction and improvement in singing abilities/confidence can be assessed to determine whether the app is actually beneficial.”

We have justified the 250 milliseconds latency in our design review report. However, we do agree with the feedback that it will take more than just scientific and literary evidence to justify this threshold—we need actual users’ experiences. We do plan on testing this product on users and obtain their feedback on whether the latency is too high or whether the feedback is too soft. This information has also been included in our design review report. We will do such testing as soon as we can in case we need to make potential modifications.

Hardware

“The hardware being used is over $300 which is a lot for a student vocalist to afford. No considerations were mentioned for handling audio from different sources”
“The only concern I have is that they require the user to wear a specific headset which seems a little unreasonable.”
“If this is a free application, what will happen if a user can’t purchase the high quality tools being used to record audio? What if their microphones have a lot of noise? Etc.”

There were many concerns about the expensive hardware and how that relates to our goal of this app being accessible and appealing to the casual singer. We agree that the average casual user will not being willing to fork over 300 dollars for a headset and interface. However, this was a change initiated by our advisors, as they said that we should initially focus on a “proof of concept.” The hardware does a lot of noise filtering for us and preserves the quality of the input sound. We are using this expensive, high-quality hardware so that we can focus on the main meat of this project—pitch detection and feedback generation. If we hit our MVP and use-case requirements early, one of our reach goals is to see if we can modify our pitch detection algorithm to work with a regular laptop microphone—as there would be no additional expenses to the user in this case.

Pitch Detection Method

“What would happen with people who are sopranos vs baritones?”
“Need to come up with a solution for singers with deeper pitch.”

It would be unfair to limit our user demographic to those who can only sing in a certain range. This would put our goal of this app being accessible to casual users at risk. We plan to address this risk by allowing multiple “correct” pitches for each scale and melody that span octaves. The feedback mechanism will view a C3 equal to a C4 equal to a C5, and so on. As long as they are singing the correct relative pitch, then the feedback mechanism will not ding the user.

“My past experience with PyAudio have not been good, so I would suggest further testing with PyAudio. My previous with PyAudio were with audio analysis on songs with multiple layers, so it may be more effective at single note pitch detection”
“From personal experience in a project, pyaudio doesn’t work very well and seems buggy. It may require extra effort or practice to use unless someone on the team already has the experience.”

As of right now, we haven’t run into too many issues regarding pyaudio, thankfully. However, a buggy module does pose a huge risk to the success of our project, as there is no easy way to internally fix the module’s implementation of a certain feature. Our use of pyaudio mainly acts as the interface between the user’s input vocals and the pitch detection—this module is not doing the pitch detection itself. Kelly has done lots of testing regarding the input stream from pyaudio and has determined no such issues so far. Maybe it is the quality of our hardware that is mitigating this issue, but the risks and concerns brought up by our peers does not seems too relevant as of now.

To extrapolate this feedback about pyaudio to the other module that we are using, aubio, we also have mitigation strategies in case aubio is buggy. We haven’t personally encountered or seen too much discourse online about aubio being buggy, but as we are using aubio for pitch detection (a large aspect of our project), we should also consider the potential for aubio to be buggy down the line. To sum the risk mitigation plan up, it would be to either a) doing pre-processing on the input vocals, b) switching modules. More details justifying these two plans can be found in our design review report.

“Seems a little unrealistic to be moving from Python to C[++] given the short time frame therefore it does not seem like a good mitigation strategy.”

High latency due to python’s shortcomings does prove to be a huge risk to our project. We will try to optimize the efficiency of our code (details of this included in the design report). We will attempt to move our backend to C++, but if moving to C++ is not feasible as per the peer feedback comment, we have plans of adjusting the structure of our design implementation to change where feedback is being processed and given. We will discuss and flesh this out more if latency does become a serious issue.

As we fleshed out our design in the design review report, we made some changes to our use-case requirements.

Pitch Detection Accuracy: 95% –> 87%

As our visual feedback consists of simply plotting their input note on a five-line staff, we have decided to aim for a 87% accuracy rate as a starting point. This means that our pitch detection algorithm correctly identifies the input note in most cases, but occasionally misidentifies a note. We believe that this visual representation of feedback gives a little more leeway, and is thus more forgiving to minor errors of pitch detection. We must strike a balance between accuracy and latency, and in this case, we decided to sacrifice and lower our required accuracy rate.

Latency in post-song analysis: 10s –> 5s

According to the same article by the Nielsen Norman Group, a 10 second delay is the upper limit to how long a user will stay engaged with an app. We believe that 10 seconds is more than enough time to process any post-song statistics, so we have lowered the latency requirement to be 5 seconds.

User Interface Requirements: new!

We have added use-case requirements for our user interface. This was necessary so that we can measure how seamlessly the user is able to use our app. Requirements included task time, completion rate, and task satisfaction.

This is also the place to put some photos of your progress or to brag about a component you got working.

We all mainly worked on the design review report this week. You can find the design review report under “Design Review.”

As you’ve now established a set of sub-systems necessary to implement your project, what new tools have your team determined will be necessary for you to learn to be able to accomplish these tasks?

There have been no changes to the backend tools. In the front end, we are now using chart.js to create our visual feedback.

Team's Status Reports

by anitama

0 Comments