aruiz2 – Team D8: Musician's Scribe

April 29, 2023April 29, 2023

Team Status Report for 4/29

No major design changes were required to be done to our system.

One of the major risks is that we have realized that songs being played at varying tempos are not transcribed correctly sometimes, for example our transcription might skip over some notes if it is being played too fast.

We have been focusing on testing our system for the major part of the week and these are the tests that have been performed.

SNR Tests

Added white noise to audio signal to see how it would affect the transcription of the audio. Kept increasing the amount of noise such that SNR values would be decreased, and see what might be the best SNR value to reject audios at.

Pitch Tests

Looking at the output of our pitch processor we can determine whether there have been detected any incorrect pitches. We can get the pitch accuracy by dividing the number of correct notes over the total amount of notes. If the percentage is greater than or equal to 95 that would be a successful test. Otherwise, it would fail the test.

Rhythm Tests

Looking at the output of the rhythm processor we can determine the amounts of notes and rests detected. The output is an array of 1s and 0s. As soon as there is a 0 that means a rest has occurred. Otherwise a note is being played. We count the number of notes that it has detected and the number of rests. We wanted the accuracy to be greater than or equal to 90% for a test to pass, otherwise it would fail.

Some results for 3 songs can be seen in the table below:

April 29, 2023

Alejandro’s Status Report for 4/29

During this week, I have been focusing on testing. This involved for example testing SNR threshold values. For example, I added white noise to audio signals and saw how the output of the transcription changed as I increased the amount of noise added. It seems like the SNR being set to 60dB seems to be a good value because if we go below that then the transcription sheet’s pitch accuracy definitely will fall below our 95% requirement. I also tested the rhytm and pitch accuracy on simple songs and it seems to meet our requirements. However, on more complex songs with varying tempos it might not be as accurate.

We should be on track, for next week it is mainly going to involve working on the papers and the poster and getting everything ready for demo day.

April 23, 2023

Alejandro’s Status Report for 4/22

During this week it has been mainly looking at testing, which means playing audio files and seeing how they were being transcribed. For example, I created a function to make sure that the right amount of notes were being played in a certain stave such that the total duration of the notes were not longer than the duration of the stave since this was an issue we were having. Another issue we were having is the time signature started displaying all the time 3/8, and I fixed it so that it now displays the time signature selected by the user.

I have also been trying to fix the generating PDF issue since Kumar was not able to implement it, and I have also been finding it challenging. It seems like as normally you could just use jsPDF to get into a PDF a certain div element. And I tested this with a simple div containing a sentence and it could generate a PDF containing that header. However, when we add to the PDF the div which contains the SVG element created by VexFlow it will display all blank. We will have to look into different libraries.

We might be a little behind due to this PDF issue, but if it does not work I do not think it is worth stressing too much over since it’s not a very important feature for our project to generate a PDF in my opinion.

For next week it will be mostly testing, work on PDF issue, and get final papers done.

April 8, 2023April 8, 2023

Alejandro’s Status Report for 4/8

Progress

I realized that when reading the time signature form options what is being processed is the number as a decimal in python. Therefore, I had to make a switch statement and give the time signature of 6/8 a different value when selected than that of 3/4 since otherwise, they would have the same value.

I also implemented a function in the backend that calculates the number of staves that we will need to draw dependent on the duration of the audio file.

I developed the front-end code for the save pdf button that we added, and I helped Kumar with ideas of the implementation of the backend function that turns the Vexflow output to a PDF when the user clicks on the “save-pdf” button.

Like we discussed in the Team Status report we are having issues with some audio files. Therefore, I tried to record the output of Twinkle Little Star with my phone and send that audio file to our system to process. It seems like some audio files are able to be handled but others give us errors. We are still not sure why this is occurring and therefore part of the focus of next week will be to figure out why.

I also cleaned up code accross project by removing dead code and adding documentation to make sure we are all on the same page and have an easier time when trying to understand each other’s code.

I finally edited the code that we had that drew the notes using Vexflow. Before, we looped all the time through the notes and when we reached the notes that had assigned a specific stave, drew them into that stave. This is inefficient because we loop through all the notes all the time, even though we only need the notes belonging to the stave we are drawing at that iteration of the loop. Therefore, I modified the way the notes were being processed and made a dictionary that matched staveIndex to the notes. Now, we just access the notes by getting the value assigned to a stave index and have reduced the complexity. The system did indeed get faster.

We should be on track in terms of progress.

For next week, I will be focusing on trying to figure out the issues with the audios, as well as helping my teammates out in whatever they need from the current tasks they are doing, since I know at least Kumar was facing issues implementing the backend of the save PDF button.

Testing

We will need to test the output of our integrator system. We have already started testing it by recording simple audios with the piano and feeding it into it. We will first need to test that the amount of notes in a stave is correct. We can test this by looking at the time signature and ensuring that the total length of the notes in a stave is less than the time signature, and that the note going into the next stave could not be added to the previous. We will also need to test that the amount of notes is correct. We can do this by listening to the audio and counting the amount of notes that we have heard and then count the number of untied notes that our integrator outputs. We should also test the accuracy of the pitches of the notes. We can do this by recording the notes pitch and comparing the notes pitch that our integrator outputs. Since we said we wanted it to be >= 90% pitch accuracy accurate, as long as it meets this threshold we should be good. We should also test smaller functions, like the function I created to compute the number of staves. This can be done by looking at an audio file and calculating how many staves we need for it and then comparing it to the output of our function.

April 8, 2023

Team Weekly Status Report for 4/8

Risks

While attempting to test some home-made audio files, we found that the app wouldn’t accept files of the type we were inputting. This was confusing as they seemed to be the same file types as the original tests we were using. It seems like the method of recording, such as the microphone used or the placement of the mic, can affect whether or not a recording is suitable for the app. We will need to determine what the key factors are, and whether or not we can modify our code to allow for a wider range of recording types.

Design Changes

There are currently no major design changes to our project.

Progress

April 2, 2023

Alejandro’s Status Report for 4/1

This week I implemented the SNR-rejection system of our project. It will display a red alert in the front-end of the app if the audio is rejected due to it having too much noise. If the SNR-rejection system requirements are satisfied, the audio will be processed as usual (see here).

I also focused on testing our systems. I first created a bunch of piano files in Garageband containing different audios of monophonic piano files. The audios at first were pretty simple and short for us to be able to use these and see how well our systems work initially. Then, I made a couple of longer and more complex audios to be able to test our systems with something more realistic.

I run these files on our systems and they seem to work fine on our rhythm and pitch processors but not in the integrator. At first one issue was the integrator was outputting all rests. I realized the issue was that we changed code to normalize the signal before putting it through the rhythm process and therefore I had to modify a parameter that sets the height required of a signal for something to be a peak to a much lower value since the signal output normalized ranges only from 0 to 1.

After fixing that running the integrator is not being too accurate as of right now. It seems like some notes detections are innacurate in terms of pitch. We also discussed with Tom that the way we are currently integrating the systems might not be the best so we might have to re-implement the integration system.

This week I will be focusing on the testing part and try to find what issues are going on with our integrator as well as possible fixes. I think we are on track.

March 25, 2023March 26, 2023

Alejandro’s Status Report for 3/25

First, I had to work on the ethics assignment with my team.

After that, I decided to make the front end of the website look even better. I made sure we were only using Bootstrap and barely any CSS so that resizing the window would not affect the look of the website. I also had to made sure to add code so that our backend is able to read in the values set by the user in the form when selecting a clef, time signature and audio file. Before, our code was not able to read in these values correctly and now this should be fixed. This information will be sent to the rhythm and pitch processors. I also added the copyright footer to our website.

Finally, I had to write code in the views.py file that allows the backend to send all the correct information to the front end so that we can utilize it with VexFlow to display the music sheet. This required me to make some changes to the integrator like we talked about in the team weekly status report. I changed the integrator from being in python to happening in the javascript part of our code since there was a bug where apparently sending a list containing a class from python to javascript would not work. Therefore, now we just send to the javascript the pitches and the rhythm output and call our integration function in javascript.

I also made sure that Vexflow is able to display correctly in the front-end the clef and the time signature.

Finally, I made sure that the integration system was working properly. It seems that it is now able to produce an output of notes, which means we should be able to test it next.

I would say my progress is on schedule.

Next week we will be focusing on testing the integration system as well as the other systems. We should also get started on the SNR rejection system if time allows.

March 25, 2023March 25, 2023

Team Weekly Status Report for 3/25

Risks

It seems that when entering an audio file to our system it takes a little long to transcribe with short audios. For example, it takes 24 seconds to transcribe an audio containing 4 notes that lasts around 14 seconds. Therefore, this could be a bigger issue with even longer audios.

Design Changes

Our design has two sub-processors, one for determining the pitches of each note in the audio and one for determining the rhythm of each note, followed by an integration engine. The sub-processors are implemented in Python, and we initially planned to implement the integrator in Python as well, generating a list of Note Structs and sending them to the front-end to be transcribed. However, we found out that sending information packaged within a design structure meant that the front-end could not effectively parse through the information within the structures. We realized we would have to send to the front-end information that contained primitive data types such as Strings or ints. So, we decided to integrate the pitch and rhythm processors’ outputs after the information was sent to the front end. This is because we can instead send to the front-end the output from the rhythm and pitch processor separately, since they are a list of integers and strings. The method of integration is unchanged, the primary difference is that the HTTP Response contains two outputs instead of one.

Progress

[Front-End Updated]

[Integrator Change]

March 19, 2023March 19, 2023

Team Weekly Status Report for 3/18

Risks

There are not too many significant risks we have seen right now. One is figuring out how to integrate the rhythm and pitch processor into a single array of notes while including the rests, which might be challenging for Vexflow since there is not a lot of documentation for the rests aspect of it. We plan on testing with simple files and writing simple scripts to see which way might be the easiest way to make a data structure to accommodate this.

Design Changes

Since rests are written a little bit differently than notes we might have to change our current design for the data structure of the Note. However, as of right now, no major design changes will be required.

Progress

UI Skeleton

Output from audio file containing a C scale:

March 18, 2023March 18, 2023

Alejandro’s Status Report for 3/18

First of all, I restructured the whole structure of our djangoapp application. The reason is we had created a lot of unncessary django applications for every subset of our app. So for example we had created a django application for our rhythm processor, one for our pitch processor etc. We do not need all these applications, just one for the music transcription application. Therefore, I deleted some folders and files, changed some import statements to handle the new structure, and made it look cleaner.

I also created a django form called “AudioForm” which will allow the user to select the user-defined values in the front-end. These include the audio file, the type of clef, and the time signature. This is how it looks…

I also made sure that the rhythm processor is able to handle audio files that contain more than a single channel. Since the rhythm processor is mainly looking for onset signals, I converted the several multiple-channel array into a 1D array. I iterated through each of the channels value at a specific instance and decided to just keep the value with the greatest absolute value and append it to the 1D array of the audio data that will be looked at to detect onsets.

I would say my progress is on schedule.

Next week I plan to focus on writing code for the integration of the pitch and rhythm processor, as well as adding some styling to the front end of the website since it currently contains no CSS.