Team Weekly Status Report for 3/11

Design Changes

During the Design Review process, we determined that our method of testing pitch wouldn’t be reliable to determine how well we’ve achieved our goal with the transcription. We’ve reworked our testing process to be more user-focused, by running transcriptions of common tunes (e.g. Mary Had a Little Lamb), having average musicians read the sheet music, and having them rate the accuracy of the sheet music on a scale of 1 to 10.

Risks

We started the process of implementing the Vexflow JS library this week, and found that the internal documentation of the library causes some complications when implemented with our webapp. We are still working towards a solution, and we may have to reconsider our method of converting the Note object models into PDF format.

 

Deliverables

UI SKELETON W/ C-SCALE AUDIO INPUT AND NOTE LIST OUTPUT

Alejandro’s Status Report for 3/11

This week I looked into what best parameters might be used for the rhythm processor to detect the onset peaks in an audio signal. The function find_peaks from SciPy can take in several parameters and we think that the best parameters to be utilized for our rhythm processor will be the peak distance parameter and the height parameter.

The peak distance parameter will be set to that of our time interval window so that it only looks for a maximum of one onset at the current window. Since we are iterating with a sliding window approach through our signal in our rhythm processor, we do not need to look for more than one peak at a current window. 

The height parameter is currently set to a certain number that I found to be the lowest value when playing a note based on multiple audios. However, in the future we might have to play around with this value when testing the system to try to maximize the performance of the rhythm processor based on different inputs. 

After that, my team and I mainly focused on writing the design document for the project. 

I think our progress might be a little behind overall, due to our team facing sick days and having heavy work from other classes. However, our other classes should hopefully get a little better on the coming weeks allowing us to catch up, and I think we will be fine based on timelines. 

For next week me and Aditya will be focusing on the integration of the rhythm and pitch processors.

Alejandro’s Status Report for 2/25

Met with the team and prepared for presentation. I did practice for the design presentation by reading over all our content, ensuring it contained all required information and trying to internalize everything so that I could talk to the audience without having to barely read the slides (except for maybe reading specific numbers for data).

I looked into how we can detect the SNR of our signal by doing some research online.  It turns out SciPy used to have a function that calculated the SNR of a signal, but it was deprecated. However, the code for that function is still online and therefore we can use this to detect the SNR in our audio input validation system (here).

I also researched how the rhythm processor should work and how we might want to implement it (paper). After reading some articles, I figured it would be best to try to detect the onsets in a given audio signal to detect the start of a new note. Since we will be dealing with piano music, this makes sense since the onset of a new note will be instantaneous. 

I decided to use the find_peaks() function from SciPy. I will be iterating over the input signal in the time domain, by looking at sections of the signal of a certain length (as of right now we said 1/8th of a beat) and trying to find peaks there. If I find a peak in a section I will append a 1 to the found peak array and if not I will append a 0. That is the main idea of the algorithm, which will return an array of 1s and 0s which indicate if there is a peak in that section of the signal or not (so the value of index 0 would be the first section of the signal, the value at index 1 the second section of the signal, etc).

The code right now looks something like this, but it is still being tested as the find_peaks() function from SciPy is a little complex since it will require us to come to an agreement on what values to use for the parameters to make it work correctly. I will attach the code I wrote to write the algorithm and test it at the same time.

It can also be viewed here

Team Weekly Status Report for 2/25

Risks

Implementing parts of the design have revealed many edge-cases that result in incorrect output. For example, the frequency processor picks up the signal before the musical input starts and calculates the notes as if music was being played. Also, the segmentation method of calculating the Discrete Fourier Transform can often involve overlapping parts of the signal which distorts the measured fundamental frequency.

Design Changes

We’ve made changes to the testing progress. In addition to the rhythm and pitch accuracy requirements, we plan to score our website based on how a musician feels about the accuracy of the song. We will have users listen to the reference signal and tested signal and rate on a scale of 1-10 how similar the two songs sound. Our goal is for all users to give a score of 8/10 or higher.

In terms of new work that each of us will take on based on the modify schedule, it is not really significant. Kumar should be working on the frontend of the website as he has to catch-up from last week since he was sick and was not able to put in any work. Aditya and Alejandro should be working on the same tasks as specifed in the previous Gantt chart, and then Kumar will also be working on writing user feedback forms and distributing these feedback forms to different users. We also added an optimization section based on user feedback that we will all be working on. 

Updated Schedule

Progress

Team Weekly Status Report for 2/18

Principles of engineering, science and mathematics

  • Short-Time Fourier Transform of a time domain signal. (18-290)
  • Utilization of frequency domain vs time domain to analyze different signals for different purposes (18-290)
    • Time domain for rhythm processor: we can analyze the start of a new note by detecting a sudden drop and rise in energy. This will allow us to detect the different length of each note and when a note ends leading to the start of a new note. 
    • Frequency domain for frequency processor. Thanks to the Short-Time Fourier Transform we will be able to detect the average frequency over a specific interval (of the length of our box size in the STFT), and with this frequency we can detect what note it is being played.
  • Signal-to-noise ratio (18-100, 18-290)
    • Our system will reject audios that have a poor low SNR.
  • Object Oriented Programming (15-112)
    • We will need to have different objects in our code for the characteristics of the fourier transforms, notes, audio files, etc. 
  • Web Applications (17-437)
    • Our system will require a web application for the users to access the system from their laptops. 

Significant Risks 

Unfortunately this week we have fallen a little bit off-schedule. One of our team members was sick for the whole week, therefore they were unable to make progress on the project. We also had issues with searching for software libraries that will allow us to test our frequency processor due to having installation issues which took longer than expected. For example, we tried installing a library called CREPE in python that would give us the frequencies given an audio sample. We were planning on using this to compare it with our frequency processor output. The problem is to install this it required tensorflow and installing tensorflow was giving us a lot of issues. After unsuccessfully trying to install CREPE, we decided to move away from CREPE and found another library called Parselmouth which was very easy to install. Therefore, we will be using Parselmouth to test our frequency processor. This library allows us to get all the frequencies of a specific audio, therefore we could compare the output of this library to the output of our own processor. 

Therefore the risk that I see is us not being able to finish everything on-time. However, I think that by putting in more work in the upcoming weeks we should be able to get back on track. Working on the Design Review helped us reorganize ourselves and adjust our plans based on the setbacks of this week.

Changes made to design 

We have not made any changes to our overall design, but we have worked more on developing the specifics of the sub-processors and the transcription engine. We have determined a method of isolating the pitches at specific segments of the signal, a binary mapping system to determine when a new key-press occurs, and a design structure that contains this information in a way that is easily comprehensible by the VexFlow library.

Updated schedule

We have updated our schedule a little bit, to make the process of building the sub-processors more concurrent and adding some sections for the user-testing part of our project.

 Photos about progress 

Parselmouth manually examining/testing of pitch over time compared to SciPy (previous method). Parselmouth is more accurate and comprehensible.

 

Alejandro’s Weekly Status Report for 2/18

The following courses were helpful in the design principles used in our project:

  • 18-290: Fourier Transforms, Frequency Domain vs Time Domain Signals, Signal to Noise Ratio.
  • 15-112: Object Oriented Programming.
  • 17-437: Web Applications.

During this week I mainly focused on trying to find software that will help us developing the frequency processor. Last week I found out that the pitch function from MATLAB would be quite useful to have to test against our own frequency processor. The purpose of this function is when given an audio signal it would return to the user all the frequencies found for that signal in the right order. Therefore, we can compare this to our own frequency processor to determine how accurate our frequency processor is. 

The problem is that we will need to write this code in python since we need to build our web-app utilizing Django. At first, I tried utilizing a public library called CREPE. However, it was really hard to install this library, and I was not successful in doing so. I tried for 2 days to install it, but the library required me to install TensorFlow and it was causing me too much trouble, as well my teammates. 

I decided to look for other libraries and I found another one called Parselmouth. This library was easy to install and also has a Pitch function to determine the different pitches given an audio file. The software seemed to correctly detect the frequencies given a couple of files that we had. Therefore, we will be utilizing this for testing our frequency processor. 

These are the outputs of a C-note and a C-scale: 

I also have been working with my teammates in laying out all the content for the Design Review, as well as setting up the database for our django application.

I would say my progress is on-schedule. There have been some modifications to our Gantt chart and I will be focusing this upcoming week on researching how to do the rhythm processor, as well as preparing to give the presentation for the Design Review.

Alejandro’s Status Report for 2/11

This week I helped on creating the proposal slides for the presentation that Aditya gave on Wednesday.

I was also supposed to think about the design of the data structure for the audio file in our code. I thought that we could have the following data structure for the audio file. The data structure would contain a field containing the length of the audio in minutes, the sampling rate of the audio in samples per second, the number of samples taken throughout the whole audio, and a field containing an object which would be containing the content of the Short Time Fourier Transform for our audio.

class audio {

length

samplingRate

samples

STTF{}

}

I also researched into some helpful tools that we could utilize for our frequency processor. I found out that MATLAB offers a function called “pitch()” that when given an audio file input it will give us an output containing the frequencies of the audio. This would be ideal for us to use it to detect what note frequencies are being played at a certain time. This is the output of the function for a C-scale audio for example 

and this is the output for a constant c-note

The only issue is that when the audio goes quiet we get inconsistent frequencies like for the first image at the beginning and at the end. Therefore, at moments of silence we would have to ignore the output of this algorithm and this might be a challenge we have to deal with in the future.

My progress is on schedule, and next week I will be focusing on designing the data structure for the Short Time Fourier Transform as well as helping my team in integrating the frequency processor in the back-end in Python.

 

Team Weekly Status Report for 2/11

The most significant risk for our project right now is the timeline. We have a very strict timeline where we aim to complete much of the research and outlining process for the signal processing within the next week (two weeks total including this week). While we feel it’s a realistic goal, we have found that the process can be very time-consuming and our completion of our research for this week was nearly overdue. The difficulties can compound when it comes to implementing the research in the back-end of our web app, because we have to format the outputs in a way that the computer can process, not simply charts and visual representations that we have been working with for now. There are also elements of our signal processing design that may require further work when we reach the coding stage, as Python has limited signal processing libraries compared to MATLAB, where most of our research is done. Aditya found that after determining the parameters of the STFT of the test signal in MATLAB, he had to work out a different set of parameters while working with the SciPy library’s stft() method.

 

As we have only just completed the design process process and currently we are on track with our schedule of implementation, we have not required any changes to the existing system design.

 

These are some pictures of the frequency algorithm detector Alejandro found in Matlab to detect frequencies. The first image shows an audio of a piano playing the c-scale and the second one is a constant c note. We can see the x-axis being the time doing and the y axis the frequency domain.

 

Our project includes considerations for education and economics. We realize that a lot of people might want to have access to a free easy to use music transcriber. Most transcribers out there come in the form of applications and require subscriptions to them to be able to use them. With our webapp anyone could use it for free. It would especially be useful for teachers who might want to show students the transcription of a specific song they are playing in class for example. It would also make it efficient for people to have a tool that can transcribe short monophonic audios for them, instead of just having to manually transcribe it themselves. Finally, it increases accessibility to music, especially those who may not have the time, financial resources, and other barriers. This may be especially helpful to students and teachers in low-income communities as often their arts and music programs are the first to get cut.